3DMark05's new rendering engine
What they target3DMark05 is a complete DirectX 9.0c test, running on nothing less than full Shader Model 2.0 hardware. So if your hardware driver doesn't advertise full DX9.0 Shader Model 2.0 compatibility, you're out of luck with 3DMark05.
Fragment/Pixel Shader Profiles3DMark05 uses the ps2_0 (ATI R(V)3xx), ps2_a (NVIDIA NV3x), ps2_b (ATI R(V)4xx) and ps3_0 (NVIDIA NV4x) HLSL profiles for running fragement shaders on compatible hardware. So your ATI Radeon 9500, 9600, 9700, 9800, X600, X700 and X800 board will run just fine, using either the ps2_0 or ps2_b profiles, depending on the hardware generation.
NVIDIA GeForce FX or GeForce 6 users are also in luck, getting access to the ps2_0, ps2_a, ps2_b and ps3_0 profiles depending on your hardware generation. GeForce 6 supports the lot, FX can run 3DMark05's ps2_0, ps2_a and ps2_b (ps2_b is actually a sub-set of ps2_a functionality) fragment shader programs.
Other DX9.0 hardware gets in on the game too, with access to the base ps2_0 profile. Such hardware includes Intel's GMA900 integrated VGA core, XGI's Volari and various S3 'Chrome parts.
Vertex Shader ProfilesThree vertex shader compiler profiles are available in DX9.0c; vs2_0, vs2_a and vs3_0. There's no vs2_b profile to match the fragment shader spec defined for ATI's R(V)4xx hardware, instead that hardware uses vs2_0. GeForce FX and GeForce 6 follow the same usability rules as I outlined above with the fragment shader profiles.
How their new shader engine works3DMark05's new shader engine works something like this. The application detects the hardware, chooses the most applicable set of shader profiles for the vertex and pixel shader, has the CPU get the shaders ready for execution, then runs them on the hardware using the art and geometry assets for each game test, rendering the scenes.
The shaders are dynamically constructed according to the compiler profile being used, from what can reasonably be called micro-shaders, that get grouped together to form full shader programs. Think about how a CPU processes instructions. Most instructions are made up from smaller component parts called micro-ops, that get grouped together to perform one instruction. Read from a certain memory location, write a register, etc. A similar thing happens with GPU shaders. Read from a texture, write to a GPU register, etc.
Futuremark see most new games engines building their shaders this way, if they aren't doing so already. Pick from a library of building blocks to construct your shaders, piece them together on the CPU, fire them on the GPU for processing. If they're right, and it pretty much looks like they are, dynamic shader construction, targetting different hardware, is where game engine construction is moving to.
Here's where it gets interesting, since in 3DMark05, no matter what shader profile you use, the output is going to be very similar similar within an invisible margin of error. So the shader profiles are there for speed reasons in 3DMark05. In future games, developers might choose to enable or disable visual effects based on the shader hardware being used, but in 3DMark05 that's not the case. It's a purely performance reason, but entirely valid nonetheless. If certain hardware is quicker running a certain shader profile, and it outputs a correct image doing so, let it use it, since there's no reason for that not to happen.
A new shadowing system3DMark03's stencil shadow method is most visible in Game Test 2, The Battle For Proxycon. Similar to Doom3's shadow renderer, extruded (on the CPU) shadow volumes are projected onto geometry. If the geometry falls inside the volume, it's in shadow. Pretty simple, but hard to implement robustly and at high speed, especially with complex geometry.
3DMark05 instead uses a new unified lighting and shadow system that makes use of a new way for self-shadowing, soft shadows and other desirable lighting and shadow effects. The lighting system uses perspective shadow maps. Using deferred rendering, the engine renders the scene to the depth buffer from the position and direction, and therefore perspective, of any light source that's allowed to cast shadows. Then when you render the scene from the position and direction of the viewer, you transform the pixel fragments you're about to draw from the viewer perpective, into the perspective of the light source, using a matrix transformation. Comparing depth values of those pixel fragments against values sampled in the depth map you created earlier (the shadow map) let you see if the fragment is in shadow. The depth test tells you how much you're in shadow, so you modify the fragment's colour value accordingly, to shadow it.
You need one large depth map (shadow map) per light source, which in 3DMark05's case is a single 32-bit channel texture format, 2048x2048 pixels in size (DirectX's R32F pixel format) for directional lights. A 2Kx2K depth map at 32-bit is a 16MB object in memory. However, if the target hardware supports depth stencil textures (DST), an accelerated method of rendering depth maps in a perspective shadowing system, 3DMark05 uses them instead, using the D24X8 pixel format (32-bit integer format containing an 8-bit stencil buffer). DSTs aren't a supported part of DirectX 9.0, so on hardware that supports them (GeForce3 and higher currently, but in the case of 3DMark05, that means GeForceFX or higher), 3DMark05 is using NVIDIA's vendor-specific implementation, outside of DirectX's standard interfaces, to utilise those accelerated depth maps.
Whether DST's make it into future versions of DirectX is an issue currently under discussion, between Microsoft and the hardware vendors that supply DirectX-compatible hardware, but it's unsupported at the time of writing. For point lights (single-direction, unmoving), the hardware uses a smaller cube map texture (512x512x6), with the R32F pixel format again, which is a supported item in DirectX.
Also, to do the sampling from the depth map, Futuremark use two different methods of filtering depending on the vendor hardware being used. On ATI hardware, you do the sampling in a fragment program, effectively doing the filtering yourself, without the explicit help of the hardware. On NVIDIA hardware, you get the filtering for free in the hardware. Their filtering, called Percentage Closest Filtering (PCF), takes four samples from the texture, then averages a result from those samples and feeds it back to the application. You get that in a single cycle on their hardware. The alternate method (that's run on ATI hardware at all times, or on GeForce FX or GeForce 6-series if you turn DSTs, and therefore PCF off) is to do a averaged point-sampled filter from the depth map in the fragment program and return the result of that. That's a multi-cycle set of instruction on DX9 hardware, in all cases, so obviously slower, with a different output quality due to the sampling method, than NVIDIA's single-cycle PCF.
And again, NVIDIA's PCF method is entirely outside of the DirectX specification on PC. It is present on XBox's DirectX however, and Futuremark use that to claim that NVIDIA's DST+PCF method for rendering PSMs is a developer-supported technique. But as far as I can make out, that's only the case on XBox, an entirely different beast.
So in each case, you sample inside the depth map, to get your shadow value to compare (and use) with your transformed pixel fragment that you're testing to see if it's in shadow. Futuremark only use the NVIDIA hardware filtering in combination with DSTs, so you can turn off both optimisations by disabling DSTs from within the 3DMark05 interface. For equivalent rendering on ATI and NVIDIA hardware, it appears you have to do so.
As far as problems go with this approach to shadowing, Futuremark claim there are issues with projective aliasing on certain types of geometry that's far away from the light source, where the vertex normal is also at right angles to the light casting the shadow, and there's not enough resolution and information in the depth map to draw the shadow edge properly, so it appears to step, like a staircase. I'll examine that shortly.
In any case, it appears that PSMs, as I can see described by Simon Koslov in a book on rendering techniques, have been implemented robustly by Futuremark, addressing issues with object self shadowing and projective aliasing (to a point). For a crude example (something you'll sort of see in 3DMark05's demo mode!), think of a bird flapping its wings. The upward sweep of the wing can occlude the bird's body in relation to the light source, causing the bird to cast a shadow on itself. Futuremark also handle the case where a light source is behind the viewer. That's an issue because the light's render plane is behind the viewer (who is sat at 0 on that axis), and lights with an axis value of less than zero, behind the viewer, are outside of normal render space.
The solution seems to be, without access to the 3DMark05 source code, is to extend the view plane behind the viewer enough to encapsulate the shadow-casting light (you just ignore lights that don't cast shadows, even if they contribute a light value to the scene), or just wrap the depth map from a light behind the viewer, round the view plane. They can do that since it's a high resolution (dimensionally) map. I'm not sure which they do. A couple of fixups for near plane aliasing in the pixel shader (again, I'm guessing, based on a SIGGRAPH 2002 presentation on PSMs) and things look good.
Finally, before I waffle on too much more, Futuremark compress all normal maps using DXT5 texture compression. Their reasoning that 3Dc isn't a DirectX standard and therefore they can't use it. But they use NVIDIA's outside-of-DirectX extensions for DSTs, along with hardware PCF. Hmm.