Fragment Shading Performance
A modern fragment shader unit in a Shader Model 3.0 graphics processor is geared around ADD and MADD issue.vec3 ADD rate
With our own in-house instruction rate test, we explore the limits of the fragment hardware on each GPU. Each chip is able to reach its theoretical peak, which for all the GPUs on test is a function of GPU clock, fragment unit count, times two (for each of the ADD instructions the sub-ALUs can issue for each FP unit).
With a count of 96 sub units that are able to issue ADD instructions when not dependent on each other, the R580 SKUs are so far ahead of the others as to make it embarassing for NVIDIA's products, and ATI's own R520.
vec3 MADD rate
One of NVIDIA's architectural changes when creating G70 from their NV40 base was to make sub-unit 2 in the FP units able to issue a MADD instruction. NVIDIA argue that the MADD instruction is prevalent in shader code, and it also affords them revectorisation opportunities in their shader assembler to boot (as it does ATI). Therefore investigating MADD rate is prudent.None of the drivers tested predicate the shader (as they did in a previous version of our test!), but it's also slightly disappointing to not be able to show any demonstrable repacking. All the hardware hits near its theoretical peak, and you can see the benefits of NVIDIA's FP ALU setup here. ATI can't issue a 2nd MADD per cycle, which helps 7600 GT show a peak rate higher than the 650MHz ATI Radeon X1800 XT.
While the X1800 XT would outrun 7600 GT without any real problem, as we'll show you and as should be apparent if you've been following the theoretical analysis up to just, the 7600 GT does show off what a well-clocked 12 FP unit GPU is capable of, with its particular setup.