NVIDIA G80 - Clocks and flops and samples and ROPs and things
First of all, confirmation of how big the thing is, who builds it and how fast the powered-by-G80 GeForce 8800 GTX is. The venerable TSMC currently build G80 for NVIDIA, and they're on A2 silicon for the purposes of the first reference and retail boards, but A3 silicon looms large we reckon, tweaking for yields and minor hotspots most likely. It's a 90nm part (90HS as far as TSMC's process grading goes), roughly 20x22mm in size, and contains 681M transistors (according to NVIDIA, we haven't had time to count them all yet to make sure).That's by far and away the largest graphics chip ever created by man or beast (well, people do talk about Stuart Oberman in an 'odd' way sometimes), making G71 look tiny (196mm², 278M transistors, same process) and knocking ATI's R580 off the top spot as The Big One™ (352mm², 384M transistors, same process again).
The next big thing is the chip's frequency. NVIDIA go split again, separating the base chip clock from the clock that drives the shader core. Folks like me were expecting a 2x 'double-pumped' rate if this happened (it makes it arguably easier to design the chip that way), but NVIDIA say they are happy to pass data around the chip over non-2x clock boundaries without any real issue. As it stands, with GeForce 8800 GTX, you get a base clock of 575MHz and a shader clock of 1350MHz. Yep, those fully FP32 128 SPs and associated 128 interpolator/SF ALUs are all running at over 1GHz in an 8800 GTX, giving rise to big instruction rates and potential shading performance.
The difference in architectures (fully scalar versus mostly vector) means it's hard to compare G80 to previous NVIDIA hardware and the current ATI approach in terms of raw mano-a-mano numbers, but we'll try anyway. Note that since it's unified G80 gets all shading units available for all shading ops, no matter the thread type, so the table shows max available units per cycle only. For those not paying attention, there are not 384 SPs!
Lastly before we move on, the 384-bit memory bus is a first on a consumer graphics part and not to be ignored as we talk about the chip's performance.
Clocks and flops and samples and ROPs and things
Spec / Chip | NVIDIA G80 | NVIDIA G71 | ATI R580 |
---|---|---|---|
Variant | GeForce 8800 GTX | GeForce 7900 GTX | Radeon X1950 XTX |
Process | TSMC, 90nm (90HS) | ||
Transistor Count | 681M | 278M | 384M |
Die Size | 20x22mm | 13.5x14.5mm | 18.5x19.5mm |
Clocks | 575MHz base, 1350MHz shader, 900MHz memory | 650MHz base/shader, 700MHz VS, 800MHz memory | 650MHz base, 1000MHz memory |
DirectX Shader Model | 4.0 | 3.0 | 3.0 |
Vertex Shading | 128 FP32 scalar ALUs, MADD+MUL dual-issue | 8 vec4 + scalar ALUs, MADD co-issue | 8 vec4 + scalar ALUs, MADD co-issue |
Fragment Shading | 128 FP32 scalar ALUs, MADD+MUL dual-issue | 24 vec3 + scalar ALUs, MADD+MADD dual-issue | 48 vec3 + scalar ALUs, MADD+ADD dual-issue |
Geometry Shading | 128 FP32 scalar ALUs, MADD+MUL dual-issue | ||
Data Sampling and Filtering | 32ppc address and 64ppc bilinear INT8 filtering, max 16xAF | 24ppc address and 24ppc bilinear INT8 filtering, max 16xAF | 16ppc address and 16ppc bilinear INT8 filtering, max 16xAF |
ROPs | 24, 8Z or 8C samples/clk, 2clk FP16 blend 8xMSAA, 16xCSAA |
16, 2Z or 2C samples/clk, 2clk FP16 blend 4xMSAA |
16, 2Z or 1C samples/clk, 2clk FP16 blend 6xMSAA |
Memory Interface | 384-bit, 6 memory channels, GDDR->GDDR4 | 256-bit, 4 memory channels, GDDR->GDDR3 | 256-bit, 8 memory channels, GDDR->GDDR4 |
Memory Bandwidth | 86.40GB/sec | 51.20GB/sec | 64.00GB/sec |
Theoretical Rates for GeForce 8800 GTX and GeForce 7900 GTX | ||
---|---|---|
NVIDIA GeForce 8800 GTX | NVIDIA GeForce 7900 GTX | |
Core Clock | 575MHz (1350MHz shader) | 650MHz (700MHz VS) |
Pixel fillrate | 13.8G pixels/sec | 10.4G pixels/sec |
Texture sampling rate | 36.8G bilerps/sec | 15.6G bilerps/sec |
Z-only fillrate | 110.4G samples/sec | 20.8G samples/sec |
Vertex transform rate | 10.80G tris/sec | 1.40G tris/sec |
VP MADD issue rate | 172.8G instr/sec | 5.60G instr/sec |
FP MADD issue rate | 172.8G instr/sec | 31.2G instr/sec |
Now the instruction issue rates aren't quite fair (scalar vs. vector, only MADD, etc), but we display them like that on purpose to highlight once more the fact that G80 is entirely scalar in its ALU makeup and that it has a 1350MHz shader clock in 8800 GTX form. Peak rates mean little without some measure of the efficiency of the shader core, and that's what making the chip scalar is meant to maximise in G80. Simple divides will get you to the peak vec4 MADD rates for vertex and fragment shading if you're horribly concerned.
As far as thinking about possible game performance ahead of actual measurement went, as we were working through the theoretical analysis of what the hardware could do, a surfeit of bilinear filtering ability and potential for near-peak efficiency in the shader core meant that we thought performance should flow freely, and with the ROP hardware at the end of the chip looking very sweet and peak theoretical memory bandwidth being as high as it is, GeForce 8800 GTX has the on-paper potential to fly.
NVIDIA mentioned at Editor's Day for G80 that they'd taken a look at current game shaders and those for implementation of likely popular rendering algorithms in the future, both SM3.0 and SM4.0, as the reason behind going entirely scalar and regressing back to MADD+MUL as the primary instruction basis for the SPs. Remember that NV40 (and NV30 if we remember correctly) was also MADD+MUL, NVIDIA adding an ADD (see what we did there?) back to the fragment ALUs in G7x (and NV35!) to go dual-MADD again. Reverting back out sees the company flip-flop the instruction basis for the 5th major architecture change in a row. Kind of cool to note (and it's the reason why NVIDIA quoted MUL rates at Ed's Day, rather than MADDs!).
So did our pre-game test thinking translate into what was largely expected, IQ and performance wise? Well, we'll tell you, but not before looking at a board!