Fragment Shader Performance
The focus of a modern graphics processor is its fragment processing hardware. Not forgetting vertex rate at all, mind you, but it tends to be the exception you're vertex rate limited, rather than the rule. So we use a custom tool to run a variety of shaders on the fragment hardware to measure the hardware's ability in issuing non-dependant instructions with no texture reads, to see whether the maximum shader rate can be achieved.The make-up of a modern fragment processor has a focus on multi-component ADD and MADD vector instructions in the main ALU blocks. So we show the ADD and MADD-only rates.
vec3 ADD rate
With two add units per shader ALU on all the chips, ADD rates should scale nicely with ALU count and clock frequency, and that's mostly what happens. RV515 and RV410 share a 64% efficiency in this test, with RV530 managing 59% of its theoretical peak. The difference in efficiency is likely accounted for by immaturity in the driver when dealing with issuing to triple the ALUs.
vec3 MADD rate
The MADD rate will always be lower since both sub units in an R4 or R5-class FP ALU don't have a multiplier unit. However....RV515 and RV410 maintain an easy 100% MADD issue rate, based on clock rate. RV530 isn't far behind with 94% of peak. The hardware is working flat out to process our simple shader and each chip does so without a problem.
Quick hierarchical-Z test
While not entirely concerned with the fragment shader hardware, ATI's low end hardware usually can't discard data before rasterisation because there's no heirarchical Z buffer. That bit of logic lets the hardware discard pixels before they're generated and passed to the fragment units. Without it, you're left killing fragments as they hit the ROPs instead, burning fragment hardware cycles on pixels that won't ever make it to the screen.ATI announced that RV515 contained hier-Z test logic, though, so it was work making sure it was there and working.
So we ask the hardware to draw a scene back-to-front (the worst case for depth test and discard, since depth doesn't increase), then front-to-back to see what the delta in performance is. If RV515 has hier-Z it'll have the same kind of performance as RV530 and RV410, both of which have their own hier-Z implementations.
RV515 actually comes out slightly ahead of RV530 here, confirming the presence of the early depth test hardware. You generally don't want to make the low-end parts work harder than they have to, since the brute force isn't there to compensate, so it's good to see its inclusion.