GPU Technologies - SmartShader HD
Fragment Shader Details
ATI's SmartShader implementation, as far as the fragment (pixel) shader is concerned, in R420 targets the new DirectX 9.0 PS_2_b HLSL profile, a new HLSL compiler target. It's largely similar to the PS_2_0 target and is actually a subset of PS_2_a, the target for NVIDIA's NV3x hardware.It defines 32 temporary or constant registers being available for fragment programs, up from 12 in the PS_2_0 profile (and what's supported in R3x0 hardware). Vector, scalar and texture instruction limits are up to 512 (each) per fragment program.
It's therefore no surprise that R420 supports those new features in hardware. The instruction limit in R3x0 hardware is 96. R420 therefore gives developers increased scope for more complex effects, at higher speed and efficiency than R3x0, at the cost of targetting yet another fragment shader profile.
The new hardware capabilities enabled by the targetting of PS_2_b are also exposed to fragment programs using OpenGL and the ARB_fragment_program extension.
The F-Buffer (Fragment stream FIFO buffer) is changed from R350/R360 in that it only processes pixels that are to be multipassed back into the start of the fragment pipeline, saving processing time and bandwidth on pixels that won't be looped back. While F-Buffer offers the possibility of unlimited instruction length pixel shaders, it remains to be seen if it will be exposed via OpenGL extensions any time soon. It will need a new HLSL target for DirectX.