Where I think 3DMark06 falls down
Before I start discussing where I think 3DMark06 falls down in a technical sense, largely due to how it was developed, let me make it absolutely clear that my limited analysis was done using freely available tools, peeking to see what the application does with D3D9. Please also be aware that some of the analysis is supposition without full access to the HLSL source shader. I'm working after HLSL conversion to ASM.Here are the main specific things that my analysis says trips 3DMark06 up, given the direction they took.
Image Quality Decisions
3DMark06's shader programs use 16-bit partial precision hinting, seemingly wherever possible. The implement of PP hints means a before and after IQ analysis, to make sure the PP hints aren't impacting image quality. PP hints also only benefit hardware that respect the hints. Some parts don't.So Futuremark have made that investment in time and money to assist per-vendor performance.
Further, Microsoft's documentation on instruction modifiers states that a _pp hint is likely to result in calculations carried out in signed 16-bit precision (s10e5, usually called FP16). NVIDIA's programmer's guide also asks developers to use _pp hints wherever possible, due to free FP16 normalise and speed up reasons due to register pressure.
So Futuremark are prepared to expend time and effort doing IQ analysis on their shader programs, presumably in a mostly unautomated fashion and largely based on the recommendation from just one IHV (and there is only one IHV that supports _pp in pixelshader 3.0).
However, Futuremark seem unable to accomodate other IQ-led performance enhancements that'd result from precision reduction. Depth surface precision reduction is a performance/IQ tradeoff for vendors that don't pack 24-bit surfaces (requiring a 32-bit access and saving no bandwidth over D32), when using D16 instead.
However there are no D16 surfaces used in 3DMark06, that our analysis can tell, despite there being situations (especially when casting point lights) where the renderer doesn't need massive depth precision. Futuremark's own analysis (that's likely cheaper to do than the _pp work) would show them that easily.
Shader Model 3.0 claims
Futuremark claim that the Perlin Noise feature test requires SM3.0. It does in the sense that it's compiled to a ps_3_0 target from HLSL, but there's technically nothing in the shader that would stop it being executed on SM2.0 hardware.Indeed, Futuremark miss an opportunity to compare SM2.0 and SM3.0 implementations of the Perlin Noise shader precisely to show what SM3.0 would buy you in terms of performance!
Futuremark also claim use of the ps_3_0 vPos register, which holds the current fragment's screen space coordinates. The register is useful, for example, for indexing into a screen space rendertarget, when doing certain types of shadow mapping, or even when doing per-fragment debugging.
Analysis of 06's shaders, while not 100% conclusive due to ASM generation possibly hiding its use, show it's likely used very little, if at all. Indeed, if compilation to ASM retains the vPos identifier as the compiler seems to do with other named registers, it's definitely not used. A minor point, perhaps, but vPos enables certain optimisations and techniques, so analysis of its use is prudent.
In terms of dynamic flow control, its use in 3DMark06 is light. Analysis of the ASM shader output shows that Futuremark do indeed likely issue a branch on lit fragments in the main shadow shader for the SM3.0 tests (the shaders are almost identical for Canyon Flight and Deep Freeze, but not quite).
However, it seems as if Futuremark miss another early-out opportunity (presuming I'm interpreting the shader ASM properly, which isn't easy) for testing unlit pixels which don't need softening at the umbra/penumbra border, going out to being fully lit.
As far as I can tell, the shader therefore softens the entire shadow, not just the shadow's edge. Addition of a branch on unlit fragment test could therefore possibly help performance, depending on what the hardware does underneath.
You only really want to burn the shader and texturing performance needed as infrequently as possible, and PS3.0 branching is a prime candidate for assisting that with Futuremark's CSM implementation.
Dynamic flow control is one of the real highlights of SM3.0 in terms of what it can allow a developer to do compared to SM2.0. You'd therefore be excused for thinking a future-looking benchmark designed to really tax hardware would hopefully be pushing that envelope as much as possible.
Being inconsistent in what they're prepared to maintain, code path wise
I explained that Futuremark maintain a selection of code paths for certain things, including a three-way on depth texture support and Fetch4/PCF filtering. However, other small paths that could be maintained, and be aware that I know they're small since I've written them myself in recent times, aren't considered.There's no apparent path for supporting R2VB in the Shader Particles test, and that's on maintenance of a test with shaders less than 50 instructions after expansion by the compiler. The original HLSL shaders should be even smaller.
Supporting R2VB in the main is just a few lines of setup code to check for hardware support, with the shader maintenance (especially for a single texture access) minimal.
There's no multiple code path maintenance for shadowing, either. It's pretty clear that vendor-specific support is going to be considered for games that use the most popular soft shadowing techniques, taking into account hardware features and performance.
Doing that with 3DMark06's CSM technique, on the surface at least, doesn't seem to be something they'd have trouble undertaking.
Normal map compression is another candidate, given widespread hardware support for 3Dc. A larger download for the compressed surfaces would likely be tolerated, given how large it is already.
Oddly engineered CPU tests
They shouldn't have been engineered separately, in my opinion. Instead there should have been a CPU measurement during a brand-new game test, showcasing a real-world, real-time 'game' and graphics application of the new technologies used to build the new CPU tests as they stand.The way the tests sit as released means that they tell you very little about real-world CPU performance. Supposedly real-world, real-time 3D graphics output combined with synthetic, non real-world CPU tests don't sit well together, when asked to conspire to create the final score.
Summary
3DMark06 is too inconsistent and doesn't do as much as it should. Or indeed it does entirely too much! Futuremark chose to go down the direction of per-vendor support and multiple code path maintenance, but don't extend that to encompass all the target vendors (where reasonable for them to do so, of course) and implement what they maybe should.And that leads us nicely on to the conclusion.