NV40
Let's do things the easy way here, with a table to peek at so we can discuss the details.The GPUs to talk about are as follows: NVIDIA's outgoing NV38 that powers its current high end product, GeForce FX 5950 Ultra. ATI's R360 that powers its Radeon 9800XT product. And of course NVIDIA's NV40, the focus of this article and the GPU that powers the GeForce 6800 Ultra review sample.
NVIDIA NV40 | ATI R360 | NVIDIA NV38 | |
Process | 130nm @ IBM | 150nm @ TSMC | 130nm @ TSMC |
Transistor Count | 222M | 107M | 130M |
Geometry Pipeline | VS3.0 | VS2.0 | VS2.0+ |
Fragment Processor | PS3.0 | PS2.0 | PS2.0+ |
Fragment Processor Setup | 2 full ALU (not equal) each with 1 mini ALU, Fog ALU, per pipe | 1 full ALU, 1 mini ALU, per pipe | 1 full ALU, 2 small ALU, per pipe |
Fragment Processor Precision | FP32, FP16 | FP24 | FP16, FP32 |
Traditional Render Setup | 16 x 1 | 8 x 1 | 4 x 2 |
Vertex Shaders | 6 | 2 | Adaptive array |
Basic Texture Filtering | Trilinear | Bilinear | Bilinear |
Texture Filtering | Bilinear, Trilinear, 16X Anisotropic | Bilinear, Trilinear, 16X Anisotropic | Bilinear, Trilinear, 8X Anisotropic |
Antialiasing | Multi-sampling | Multi-sampling | Multi-sampling and super-sampling |
AA Sample Type | Rotated grid up to 8X with supersampling combined at 8X | Scattered/sparse grid, up to 6X | Ordered grid, up to 4X, up to 8X with super sampling |
Bus Support | AGP8X | AGP8X | AGP8X |
Memory support | GDDR3 | DDR, DDR2 | DDR, DDR2 |
Basic Core Frequency | 400MHz | 412MHz | 475MHz |
Basic Memory Frequency | 1100MHz | 730MHz | 950MHz |
Memory Bus Width | 256-bit, memory crossbar | 256-bit, memory crossbar | 256-bit, memory crossbar |
Basic Pixel Fillrate | 6400Mpixel/sec | 3296Mpixel/sec | 1900Mpixel/sec |
Basic Multitexture Fillrate | 6400Mtexel/sec | 3926Mtexel/sec | 3800Mpixel/sec |
Basic Memory Bandwidth | ~35.2GB/sec | ~23.4GB/sec | ~30.4GB/sec |
The basic specs give you an initial theoretical performance picture, with NV40's improvements over NV3x quite clear. 16 basic pixel pipelines, 6 'fixed' vertex shader units, twice the shader horsepower, trilinear texture filtering as the default filtering method (more on that later), rotated grid multisampling for (hopefully) better AA quality, up to 16-sample angle-adaptive anisotropic filtering (available with trilinear throughout) and 32-bit precision throughout the entire gamut of processing functions are the easy ones to spot.
Shader Model 3.0 support in DirectX 9.0c is the other big feature addition, but more on that later.
Its basic performance figures and features seem like a decent jump over the previous high-end NVIDIA GPU. NVIDIA appear to have agreed with everyone else in observing NV3x's biggest weaknesses and have chopped out the bad bits wholesale, replacing them completely.
According to documentation, the shader units are completely new. Notice my emphasis in the data table above, when listing NV40's fragment processor precision options. With NV38 I listed FP16 first, the optimal mode for the GPU to extract performance from.
With NV40, full 32-bit floating point precision everywhere is what's stressed. On top of that emphasis, the GPU gains features that NV3x doesn't have, like full 32-bit floating point filtering, blending and texture/surface support throughout. The NV40 can render to 32-bit multiple render targets finally (with some caveats, at least initially), something NVIDIA never properly implemented in NV3x. The improvements, especially in terms of NV40's 32-bit floating point performance and support for 32-bit render targets, are a most welcome addition.
It appears like NVIDIA have done 32-bit floating point right with NV40, at least on the surface. Implementation details have spoiled previous parties, something we won't forget in a hurry.
Finally, with previous criticism of NVIDIA's texture filtering performance and quality, the implementation of trilinear filtering as the default texture filtering method, along with new angle-adaptive anistropic filtering, should help raise NV40's basic image quality up to a new level, something sorely needed.
All of the above will be examined in forthcoming pages. Firstly however, a look at the reference board sent to reviewers.