facebook rss twitter

ATI's Radeon X700 XT Preview

by Ryszard Sommefeldt on 20 September 2004, 00:00

Tags: ATi Technologies (NYSE:AMD)

Quick Link: HEXUS.net/qa3i

Add to My Vault: x

RV410 versus R420 and R423

For this comparison, I'm comparing the X700 XT version against an X800 XT Platinum Edition (R420) and X800 XT (R423), so you can get some indication of the differences in the high-end mid-range and high-end high-end parts that ATI produce. Obviously, X800 PRO sits somewhere in between.

GPU
Click for a bigger version
RV410 R420 R423
Process 110nm @ TSMC 130nm @ TSMC low-k 130nm @ TSMC low-k
Transistor Count Unknown Unknown Unknown
Geometry Pipeline VS2.0 + VS3.0 geometry instancing support VS2.0 + VS3.0 geometry instancing support VS2.0 + VS3.0 geometry instancing support
Fragment Processor PS2.0b PS2.0b PS2.0b
Fragment Processor Setup 2 full (vector/scalar) ALU (not equal), one texture ALU, F-buffer 2 full (vector/scalar) ALU (not equal), one texture ALU, F-buffer 2 full (vector/scalar) ALU (not equal), one texture ALU, F-buffer
Fragment Processor Precision FP24 FP24 FP24
Traditional Render Setup 8 x 1 16 x 1 16 x 1
ROPs 8 16 16
Vertex Shaders 6 6 6
Basic Texture Filtering Bilinear Bilinear Bilinear
Texture Filtering Bilinear, Trilinear, 16X Anisotropic Bilinear, Trilinear, 16X Anisotropic Bilinear, Trilinear, 16X Anisotropic
Antialiasing Multi-sampling Multi-sampling Multi-sampling
AA Sample Type Ordered grid, multiple sample grids, up to 6X, up to 3X temporal Ordered grid, multiple sample grids, up to 6X, up to 3X temporal Ordered grid, multiple sample grids, up to 6X, up to 3X temporal
Native Bus Support PEG16X AGP8X PEG16X
Memory support GDDR3 GDDR3 GDDR3
Basic Core Frequency 475MHz 520MHz 500MHz
Basic Memory Frequency 1050MHz 1120MHz 1000MHz
Memory Bus Width 128-bit, memory crossbar 256-bit, memory crossbar 256-bit, memory crossbar
Basic Pixel Fillrate 3800Mpixel/sec 8320Mpixel/sec 8000Mpixel/sec
Basic Multitexture Fillrate 3800Mtexel/sec 8320Mtexel/sec 8000Mtexel/sec
Basic Memory Bandwidth ~16.80GB/sec ~35.84GB/sec ~32.00GB/sec

RV410 is ATI's first mid-range part that's produced on TSMC's 110nm foundry process. A smaller foundry process means a smaller die size for the same transistor count. With RV410 having significantly less transistors than R420, due to half the fragment pipes, half the ROPs and a smaller memory bus width, RV410's die size and transistor count is significantly smaller than R420 in both respects, crucial for a mass-market part in this sector.

That ATI didn't chop anything from the Z-buffer optimisation scheme present in R420 is perhaps the most significant part of RV410's design. ATI's previous RV parts have removed the heirarchical Z-buffer from the full, parent R part. Not so in this case, allowing RV410 to have maximum efficiency after the geometry setup engine, discarding pixels before they enter the fragment shaders, as possible.

Unlike NV43, RV410 keeps one ROP per fragment pipe, whereas NVIDIA's mid-range GPU cuts that in half, limiting its raw, single textured, bilinear filtered pixel fillrate. So RV410 has the edge in raw pixel output at the same clocks, compared to NV43.

RV410's vertex shader unit count is surprising too. It retains the same VS unit count as R420 and R423, giving the X700 XT at 475MHz the same vertex shader power as an X800 PRO. So at the same clocks, RV410 has roughly twice the vertex shader power as NV43. In games that are geometry or vertex shader limited, that could be significant.

A 128-bit memory bus width, combined with the same fast memory types that R420 and R423 support, means that RV410's memory bandwidth requirements are balanced when compared to the fragment shader and ROP output.

Native PEG16X bus support without a bridge chip, like NV43, means that RV410 will make its deput on PCI Express first, with an AGP variant (native interface again, no bridge chip, probably called RV413 or somesuch) coming later.

RV410's rendering pipeline

RV410's render pipeline is pretty simple to imagine. Much like NV43, RV410 generates geometry using the vertex shader, either by running a vertex program or emulating a fixed function path from hardware and Shader Models of old. That geometry is sorted for depth, the Z-buffer and its optimisation scheme compressing and disregarded vertex data early in the pipeline to stop geometry being turned into data that needn't be processed, and if it does need to be operated on, that it's optimised as much as possible for the next part of the pipeline.

Vertex data is tranformed into pixel fragments and dispatched to the fragment shaders by the GPUs scheduler. The fragment shaders operate on the pixel fragments in parallel as much as possible, optimising data throughput, getting as many fragments through the pipeline as possible. The fragment shaders, as described in the table above, are made of two functional arithmetic units (vector and scaler units, not equal in capability, the 2nd unit with undisclosed ability), a texturing unit, able to supply fragment programs being run with data from a texture source, with the F-buffer able to loop fragments back round the fragment shader units with saved state, unlimited times.

After processing in those units, the output fragments are passed into a crossbar dispatch unit, which feeds them into the GPU's ROPs (render output units), that do all the blending, combining, anti-aliasing and colour compression of the output buffers, before the final processed pixels are spat out for display.

It's not too hard to understand. Generate triangles, convert them to pixels on the screen, run your shaders that change their appearance, output the fragments and do anti-aliasing and buffer combining and blending, draw the pixels on your screen. Et voila (simplified somewhat).

Simple summary

All in all, quite the nemesis for NVIDIA's new GPU and more potent in a couple of key places. Let's examine the X700 XT reference board, while also comparing it (exclusively I think) to an X700 PRO engineering sample.