R420
Like the NV40 article, the first thing to consider is a broad overview of the GPU specifications, in handy table format. Comparison is made with NV40 and R360.GPU | ATI R420 | NVIDIA NV40 | ATI R360 |
Process | 130nm @ TSMC low-k | 130nm @ IBM | 150nm @ TSMC |
Transistor Count | Unknown | 222M | 107M |
Geometry Pipeline | VS2.0* | VS3.0 | VS2.0 |
Fragment Processor | PS2.0* | PS3.0 | PS2.0 |
Fragment Processor Setup | 2 full (vector/scalar) ALU (not equal), one texture ALU, F-buffer | 2 full (vector/scalar) ALU (not equal) each with 1 mini ALU, Fog ALU, per pipe | 1 full ALU, 1 mini ALU, per pipe |
Fragment Processor Precision | FP24 | FP32, FP16 | FP24 |
Traditional Render Setup | 16 x 1 | 16 x 1 | 8 x 1 |
Vertex Shaders | 6 | 6 | 2 |
Basic Texture Filtering | Trilinear | Trilinear | Bilinear |
Texture Filtering | Bilinear, Trilinear, 16X Anisotropic | Bilinear, Trilinear, 16X Anisotropic | Bilinear, Trilinear, 16X Anisotropic |
Antialiasing | Multi-sampling | Multi-sampling and super-sampling | Multi-sampling |
AA Sample Type | Scattered/sparse grid, up to 6X | Rotated grid up to 8X with supersampling combined at 8X | Scattered/sparse grid, up to 6X |
Bus Support | AGP8X | AGP8X | AGP8X |
Memory support | GDDR3** | GDDR3, DDR | DDR, DDR2 |
Basic Core Frequency | 520MHz | 400MHz | 412MHz |
Basic Memory Frequency | 1120MHz | 1100MHz | 730MHz |
Memory Bus Width | 256-bit, 4 partition memory crossbar | 256-bit, 4 partition memory crossbar | 256-bit, 4 partition memory crossbar |
Basic Pixel Fillrate | 8320Mpixel/sec | 6400Mpixel/sec | 3296Mpixel/sec |
Basic Multitexture Fillrate | 8320Mpixel/sec | 6400Mtexel/sec | 3926Mtexel/sec |
Basic Memory Bandwidth | ~35.84GB/sec | ~35.20GB/sec | ~23.40GB/sec |
It's scarily similar to NV40 in places. 16 pixel pipelines capable of outputting 16 textured pixels per clock cycle. Each fragment processor is capable of 5 arithmetic ops per clock, one on each of the five arithmetic units present. 520MHz core frequency for a simply scary 8320Mpixel/sec of single and multi-texture fillrate. It's got masses of memory bandwidth and uses GDDR3 DRAM devices.
The fragment processor precision hasn't changed from ATI's R3x0 GPUs, fixed at FP24 for fragment shader operations. It's here that the basis of R420's operation is defined, with the GPU not supporting Shader Model 3.0 in the fragment or vertex shader, rather a superset of the base Shader Model 2.0 specification. A new DX9 PS profile, PS2_0_b, has been created to support the R420's GPU features under DX9 and SM2.0, since it goes beyond being a simple 16 pipe R360 in its capability. More on that later.
In short, given the basic spec in the feature table above, it's going to be ohmygod fast, and look ohmygod good, just like NV40. There's a lot more to cover than is specced in the feature table, such as how the fragment shader can go about its business and the new antialiasing mode.