RV530
RV530 is a fairly different beast internally to RV515, although there are many functional similarities. Refer back a page to RV515 as needed, as we build up the picture of RV530 based on differences compared to that chip.Vertex processing
RV530 shares the same VS processing hardware as RV515, just there's five units to RV515's pair.Pixel processing and thread dispatch
There's still only one fragment quad in RV530, just like RV515, the difference being the four fragment processors that make up that quad are three times as 'wide'. The fragment to be shaded is therefore processed by three ALUs in lock step, rather than a single ALU. The ALUs are the same '4D' units, each capable of the same instruction issue rate, and each block of three ALUs has access to the same texturing resources.The increase in ALU instructions in the pixel shader programs of modern games means that the triply parallel design of RV530's fragment hardware has obvious benefits. Pure arithmetic rate goes up threefold and doing more work per cycle is the underlying tenet of modern 3D graphics processing.
So pixel threads are also three times bigger, batched 12 wide in queues of 4, but RV530 only maintains the same 128 threads in flight that RV515 does.
Texture processing and pixel output
RV530 possesses the same texture processing ability of RV515, pairing 4 address and sampler units to create the entire TMU. It can also bilinearly fetch from single channel textures and pack the fetch into a four channel result, like RV515, in a feature ATI call Fetch4.ROP count is the same as RV515 at 4, and all features are identical, barring RV530's double Z-only rate. So still two colour and two Z writes per cycle, but four Z (depth) writes if you mask off colour.
Ring bus memory controller
RV530, at 157 million transistors, has the same dual-ring memory controller as R520, just with a 256-bit internal width and 128-bit external interface to the DRAM devices. The ring bus controller allows client interfaces to ask for memory requests, with writes to the DRAMs going via a crossbar switch which arbitrates write access to the correct device.Read requests traverse the ring intelligently, at least as much as the memory controller has the ability to govern given its programmable interface anyway. Given that the memory controller 'knows' where each broad block of data is and stores addresses for those blocks, it sends requests round the shortest path to each of the five ring stops on the ring.
Four ring stops are for the DRAM devices themselves, which connect to their stop in pairs. The fifth is for general I/O to things like the PCI Express bus and by extension ATI HyperMemory, allowing the memory controller to address those resources properly.
We'll go further in depth in a separate piece, but suffice to say the memory controller was designed for flexibility, reduced latency and scalability in terms of clock rate. Wire density is reduced because of the controller's layout, and only cost holds back the external interface, ATI showing the full double-wide variant with R520.
Summary and die shot
RV530 features full hier'-Z, an 8KiB texture cache (guessing, but likely) and all the Avivo goodies reported on here and here.ATI RV530 GPU Properties | |
---|---|
GPU | ATI RV530 |
Process and Fabricator | TSMC 90nm |
Die Size | 13x12mm |
Transistor Count | 157 million |
DirectX Shader Model | ShaderModel 3.0 |
Basic Configuration (VP/FP/ROP) | 5/12/4 |
Vertex Shader Info | VS3.0 (no texld) 5D FP32, co-issue MADD, branch single cycle trig functions |
Fragment Processor Info | PS3.0 4D FP32, dual-issue ADD+MADD, branch single cycle trig functions |
ROP Info | 6x FP MSAA (2 subsamples/cycle) 2x Z-only rate FP/FX16 blender |
Texture processing | 4 FP32 address units, 4 samplers Bilinear filter for integer samples |
Memory Interface | 128-bit, 256-bit ring bus GDDR->GDDR4 |
GPU | |
Display output | 2x dual-link DVI TMDS transmitters ATI Avivo |