How does it actually work?
HEXUS: How does the patch actually work? I'm guessing that you're having the game render into a 10:10:10:2 backbuffer (with maybe a format change for some intermediate rendertargets), so the hardware can MSAA resolve from that and then scanout. That about right?Chuck @ ATI Before I get into how the patch works, let me give you and your readers some background. When running in HDR mode, Elder Scrolls 4: Oblivion uses two 8-bit per component A8R8G8B8 surfaces for the flipping chain (front and back buffers) and a 16-bit per component, 64-bit per pixel, floating point (FP16) renderable texture for the HDR data. This FP16 buffer is where almost all of the rendering is done. Each frame, after drawing to the FP16 buffer, they bind it as a source texture and render to the back buffer, using shaders to map the high dynamic range floating point data into the limited range of the back buffer. This is all good, but how does mutli-sample anti-aliasing fit (MSAA) into this picture?
Currently the Radeon X1000 cards (X1300, X1600, X1800, X1900, etc) are the ONLY cards on the market that support MSAA on FP16 surfaces. This support is built into the hardware and enables both high performance and high quality AA. Benchmarks like 3DMark06 and games like Juiced and Serious Sam 2 already use FP16 MSAA on the X1000 cards and it looks great. These apps use the DX9 API to enable FP16 MSAA.
There is a slight catch though. The DX9 API allows apps to create FP16 MSAA surfaces, through CreateRenderTarget(), but they can't be directly used as textures. So while an app that doesn't support FP16 MSAA might create a FP16 texture, render to it, and then use it as a texture, an app that supports FP16 MSAA needs to create a FP16 texture and a FP16 MSAA render target, render into the MSAA buffer, copy the MSAA buffer into the texture, and then use the texture. Now this may sound like a big difference, but it's really not that big of a change. (When you think of how complicated the Oblivion engine is and how many hundred of surfaces they use, adding this shouldn't be hard.)
There is a reason that the DX9 API requires a copy from the multi-sample AA buffer to the normal single sample texture. The hardware that fetches from the texture doesn't know how to combine the multiple sample in the MSAA buffer. Also, it doesn't make sense to do the work when fetching from the texture, because if you fetch from the texture multiple times, you'd end up combining the samples multiple times, which would be inefficient. The copy from the MSAA buffer to the texture buffer triggers the driver to do a resolve pass, which combines the multi-samples in the MSAA buffer into a single sample.
So after all that, I can answer how the patch actually works. There is no rendering into a 10:10:10:2 backbuffer and yes the driver will still pass WHQL. The simple explanation is that I made the driver behave exactly the same as if the app had actually added support for FP16 MSAA.
The technical details
Chuck @ ATI Here are the gory details. When you force on MSAA through the Catalyst Control Center, our driver has a special "ForceAA" path that will allocate an MSAA buffer and bind it to the flip chain (back buffer). That causes all rendering to the flip chain, to actually go into the MSAA buffer, so you get higher quality. This doesn't work for Oblivion, because most of the rendering doesn't go into the flip chain, so you don't see any difference. In my patch, I made a special ForceAA path to enable MSAA on textures instead of the flip chain.Currently, this path is only enabled for Oblivion under Catalyst AI app detection. The exciting thing is that this ForceAA path could be enabled for other games in the future, like FarCry and Splinter Cell, so these games could get HDR+AA also! In my special ForceAA path, I detect when the correct FP16 renderable texture is created and then allocate a separate FP16 MSAA buffer and bind it to the texture. From that point on, all rendering the app does to the FP16 texture actually goes into the FP16 MSAA buffer instead.
When the app is done rendering into the FP16 buffer and attempts to bind it as a source texture, I tell the hardware to combine the multiple samples from the MSAA buffer and put the anti-aliased image into the FP16 texture that the app created. The app then reads from the texture as usual. All of the data is full-quality FP16 from start to finish. We don't do any shortcuts or swap formats behind the app's back.
(Note: When you hit TAB to equip weapons, etc., that scene is not rendered into the FP16 surface, so it will not get AA even with the patch. You can see how aliased your character looks and compare that to how good the game looks.)
Answering some common questions
Chuck @ ATI This driver can be installed on all Radeon cards, not just X1000 cards, but HDR and HDR+AA is only supported in Oblivion on X1000 cards. You don't need Crossfire to get HDR+AA in Oblivion. If you have Crossfire, this driver will show gains on all Crossfire configurations. To get HDR+AA, force on AA in ATI's Catalyst Control Center, and set Oblivion to HDR mode with no AA.How long did it take to engineer?
Chuck @ ATI On Friday, I spent a few hours studying the game's rendering path. Over the weekend I thought about how I could design an HDR+AA path for Oblivion without breaking other programs. On Monday, I wrote all the code and started testing it. I'd say about 12 hours total for designing and codingSummary
So Chuck basically has the game render into an FP16 surface for game textures (MSAA-able rendertarget, the driver asking for the resolve) and then when Oblivion wants to texture, Chuck has the driver copy the resolved surface into the texture space the app thinks it's going to sample from.He doesn't have it downsample to 10:10:10:2 as I first thought (swapping surface formats behind any app's back is usually a bad idea) and it stays FP16 (per-channel of course) throughout. It also makes you wonder why Beth have been hesitant to add it in, allocating the MSAA-able rendertarget and rendering into that, before copying (StretchRect after a global CheckDeviceFormatConversion maybe, to detect support by the driver) to copy to the bound texture before further sampling.
Having it an Xbox 360-only feature seems like the reasonable explanation, providing Bethesda weren't simply worried about performance overhead or rendertarget limitations.
So there you have it. Chuck at least earns a few days off to get some game time in, and a spin in one of Joe's Modenese machines. Cheers to Terry 'just a glass of water please' Makedon for the opportunity to ask Chuck the questions!