GeForce3 Architecture
Shown below are the key elements that make up the GeForce3 Ti core. The hardware features remain the same as the original GeForce3. There are two additional features hidden inside the GeForce3 core, which are enabled when you use the new Detonator XP drivers. The new detonator drivers will also enable the features on the old GeForce3 as well, so in fact all you actually gain from the Ti GPU is more speed. The first of the new features is the Shadow BufferShadow Buffers
Shadow buffers create realistic shadow effects in real time. Enables
self-shadowing for characters and objects, and softens the edges of shadows for
realistic effects, adding depth to scenes and highlighting spatial relationships
between objects. Shadow buffers have become accessible to end user
now where they where limited to very high end workstations designed
specifically for 3D animation. Shadow buffers where used extensively in
the Film Final Fantasy to give a more realistic look to the animations and
make the people look much more fluid and lifelike.
The Nvidia shadow buffer technique used by the GeForce3 Ti GPU involves a "map" of which objects are lighted in the scene. This map is stored in a shadow buffer so that it can be used later and accessed like a texture through the special shadow buffer hardware located in the texture engine within the GeForce3 Ti GPU. The Diagram below shows the layout of the GeForce3 GPU.
Until now, shadow buffering was only available on professional rendering equipment, outside of the consumer's reach. With the addition of shadow buffer technology to the NVIDIA GeForce3 Ti GPUs, PC graphics get closer to replicating a real-world visual experience, closing the gap between PC gaming and cinematic-style special effects.
3D Textures
A new technology that is part of the nfiniteFX Engine, 3D textures make hollow
objects solid with true three-dimensional material properties such as wood grain
or marbling.
NVIDIA 3D textures can produce the following effects:
- Volumetric Fog—provides depth and density to fog effects, rather than just height and width
- Imposters—allow for the perception of correct look and orientation when the camera moves, without excessive calculation
- Functions Lookup—functions such as depth-of-field effects and BRDFs (Bidirectional Reflectance Distribution Functions) can be stored in a 3D texture, saving both time and effort
- Procedural Textures and Noise—explosions, lightning, or plasma effects can be created with NVIDIA's 3D texture technology.
Once 3D Textures become widespread, games will have the possibility to become a lot more realistic looking, as yet I havent actually seen any demos or games that use the feature, but its only a matter of time.
GeForce3 Technical Features
There are a number of nifty tricks which the GeForce3 uses to decrease the memory bandwidth requirements thus increasing the speed of game.
High Order Surfaces
This is a new way to describe a surface in a 3D game environment, instead of many thousands on polygons creating a surface the surface is split down into control points, a very complex surface can be mapped in a small number of control points, these make much more efficient use of the available memory bandwidth. The GeForce 3 can process these High Order surfaces in hardware, it then processes the control point on the card to create the final surface. With just 16 control points the GeForce3 can create the equivalent of hundreds of thousands of polygons worth of geometric data. This effectively offers hundreds or even thousands of times the efficiency of transmitting the same triangular data geometry across the AGP bus.
Pixel Memory Bandwidth
Traditionally all the pixels in a scene are rendered, regardless of if they are visible in the final scene, rendering a single pixel once requires a number of operations, The Graphics processor reads the colour buffer, to discover the previous value, read the z-value to determine the depth of the pixel in the scene, and to read the texture data necessary to texture map that pixel. Once the pixel is generated it requires writing the new colour value to the colour buffer, and potentially writing the new z-buffer value. In the case of 32bit rendering this requires 32bits, or 4bytes of data per access. So :
Colour Read + z- read | + Texture Read | + Colour Write + z-write |
4bytes + 4bytes | + 4 bytes +4 bytes | + 4bytes = 20 bytes |
So this doesn't get to complex I'll just fill in the figures to show what this all means per scene, as an assumption in a modern game each frame is rendered 2.5 times per pixel this is the average depth complexity in a modern game. Using a resolution of 1024*768 in 32 bit this calculates out to:
Horizontal Resolution | Vertical Resolution | Depth Complexity | x 20 bytes/ Pixel |
1024 | 768 | 2.5 | x 20 = 39,321,600 bytes/frame |
What all that adds up to is 39.3MB per frame if you assume a framerate of 60 then you come out with the memory bandwidth of 2.4GB/sec !!!, if you do the same calculations for a resolution of 1600*1200 * 32 bit the bandwidth requirements go up to 5.8GB/sec !.
From the above calculation you can see why the GeForce series of card have been using DDR memory for some time now, Bandwidth is a key problem area for increasing the realism of graphics on computers.
GeForce3 uses many new techniques to improve the usage of the memory bandwidth these key features are:
Crossbar Memory Controller |
Lossless Z Compression |
Z Occlusion Culling |
Crossbar Memory Controller
Traditional memory controllers have become fairly well developed, they now get greater than 50% of the peak memory bandwidth from the frame buffer on the card, under most conditions. In today's 128 Bit DDR memory controllers the data access is in 256-bit chunks since the DDR memory transfers twice the information in a single access. Large chunks might sound good on the whole, but when you consider that a single triangle which is the building block of a 3D game is only a few pixels in size this results in a data chunk for that triangle of only 64 bits, this means that a memory controller that only transmits 256 bit chunks would be wasting a lot of bandwidth, in this example a 128 Bit memory controller would only be 25% efficient wasting 75% of the precious memory bandwidth.
The GeForce3 implements a new crossbar memory controller which can access and process 64 bit chunks of data, this ensures that the memory bandwidth is used up in useful data transfers rather than waste. The Crossbar memory controller implements 4 independent memory controllers, each of these communicate with the other parts of the controller, automatically load balancing each other so that all the time the memory controller is working at maximum efficiency. To try and explain the controller better a crossbar memory controller diagram is included below.
All this adds up to extra speed, under typical complex loads which will start to appear in next generation games, the crossbar memory controller can be up to 4 times as efficient as previous less intelligent designs.
Lossless Z Compression
The z buffer is the method by which the depth or visibility of an object is calculated for the pixels, which will be displayed in the final rendered scene. Traditional graphics processors read and potentially write z-data for every pixel they render, hence z-buffer traffic is one of the largest consumers of memory bandwidth. GeForce 3 implements an advanced form of Z buffer compression which offers 4:1 Lossless data compression, hence the z-buffer data bandwidth is reduced by a factor of four. The z-buffer compression is implemented in hardware and as such is transparent to the application. The compression and decompression of the z- buffer data is implemented in real time by the Lightspeed Memory Architecture's z-compression/decompression engine. As there is no loss during the compression and decompression there is no loss of visual quality or z buffer precision. So image quality remains the same but the speed increases.
Visibility Subsystem: Z-Occlusion Culling
As mentioned previously traditional graphics architectures render every pixel of every triangle as it receives them, accessing the frame buffer with each pixel to determine the correct value for the colour and the z (depth) for every pixel. This method produces the correct results but at the expense of wasted bandwidth as every pixel is rendered regardless of visibility in the final scene. Typical content in today's games has a depth complexity of around 2 on average, this means that on average for every pixel which is visible in the final scene, 2 have been rendered. This means that for every visible pixel the GPU is forced to access the frame buffer twice, using up valuable bandwidth on something that is essentially not seen by the end user.
To get around this problem GeForce3 implements a sophisticated z-occlusion culling system, where it attempts to determine the visibility of a pixel before it is rendered. Thus if the pixel is deemed to be invisible in the final scene due to being hidden by another object, it is not rendered, this saves time and bandwidth. Depending on how complex the scene is this can mean tremendous increases in efficiency. As current games have a depth complexity of 2, z-occlusion can save 50% of the bandwidth, as new games are created with greater depth complexity i.e. 4 the GeForce3 z-occlusion culling system will see even greater benefits with up to 4 times improvement in memory bandwidth efficiency.
As an additional technique to further increase the efficiency game developers can use a technique called "occlusion query". Essentially the application makes a request of the GPU to render certain regions of the scene to test for visibility. If the GPU determines that the region is going to occluded, then all of the geometry and rendering, for that region can be skipped. In some cases these benefits can demonstrate as much as four times the performance of previous architectures, while in practice the typical benefit of these bandwidth saving techniques averages around 50% to 100%.
GeForce3 Lightspeed Memory Architecture
The GeForce3 uses a variety of techniques to increase performance available in today's bandwidth challenged graphics processors. High Order surfaces are a big breakthrough in decreasing the bandwidth needed, By approaching the pixel bandwidth problem GeForce3 makes great leaps forward in the efficient management of the available bandwidth.
All in all these different techniques all add up to the new GeForce3 Lightspeed Memory Architecture, For the first time high resolution and high detail doesn't mean slow unplayable frame rate's.
Well that about covers the GeForce3 memory architecture lets have a quick look at the other key features in GeForce3.
Nvidia nfiniteFX Engine
The Nvidia nfiniteFX engine gives developers the ability to program an almost infinite number of special effects and custom looks. Previously game developers chose from a limited selection of effects so games often had the same sort of effects, now they can develop the look and feel for the game that they want, only limited by their imagination.
Programmable Vertex Shaders
The advances in 3D graphics technology have enabled games and 3D applications to become increasingly lifelike and realistic. With the introduction of GeForce3 with its nfiniteFX engine. Programmable vertex shaders are a prime example of these new more realistic 3D Graphics. Previously to get such features it was necessary to use an offline rendering technique where a server farm would render the effect into a final FMV section this would be seen in a cut scene in a game, but now these special effect can be processed in real time during gameplay, enabling a much more lifelike gaming environment.
The Vertex shader is a like a special effects box for a graphics card, it can apply a number of different special effects to any give vertex. As the GeForce3 vertex shader is programmable it can perform countless different special effects only limited by imagination, Here is a brief rundown on some specific effects that can be achieved:
Complex Character Animation, including skin and clothing that bends and flexes realistically |
Fogging on selected objects, a heat wave in desert scenes for example |
Procedural Deformation, as an example making a flag wave convincingly in the wind |
Morphing, using different versions on an object the vertex shader can morph between the two |
Motion Blur, blurring an object to create the illusion of immense speed |
Lens Effects, such as fish eye and optical effects through water |
Custom Lighting effects, two sided lighting to realistically portray hollow objects |
As the Vertex shader is programmable it can be doing one job one second and a completely different operation a split second later, as such it is a very very powerful tool in creating a much more realistic scene.
The picture below shows an example of procedural deformation showing the effects of bullet impact on a sheet of metal
Pixel Shaders
The final output of any 3D graphics hardware consists of Pixels, depending on resolution, in excess of 2 million pixels need to be rendered, lit, shaded and coloured. Pixel shaders create lighting and custom shading effects at the pixel level, this is an unheard of level of hardware control for consumers, and developers.
3D Graphics hardware has developed over the years, as new features have been added, resulting in steady increases in the graphics quality. In the early years of 3D acceleration, the best technology boasted bilinear filtering, of fairly low resolution texture maps, these were mapped onto fairly large polygons this all resulted in a fairly blocky and lumpy 3D world. With the introduction of the GeForce2 came the Nvidia Rasterizer Shader (NSR).
The NSR gave developers for the first time the chance to per pixel lighting effect including dot3 bump mapping. Now with the Introduction of GeForce3 with its Programmable Pixel Shader, developers have unprecedented control over each and every pixel including lighting, shade and colour.
Architectural Details
The GeForce 2 GTS performed dot3 and other per-pixel operations through its fixed function pixel pipelines. Textures could be combined, using either single pass or multi-pass techniques, to create the desired effects.There was no loop back mechanism so effects that needed dependent texture reads, such as true reflective bump mapping were simply not possible.
Programmers where somewhat limited by the fixed functions of the NSR pipeline, but it was still possible to create superb looking per-pixel effects including dot3 bump mapping, which looked very realistic and seemed to have a huge amount of geometric detail.
Nvidia nfinite FX Programmable Vertex Shaders
The Nvidia nfiniteFX engine adds a high degree of flexibility for several reasons:
Its programmable allowing developers to create their own custom pixel shading effects |
there are more texture operations and texture address registers available. |
Dependant texture reads are now possible, giving programmers even greater flexibility. |
The Nvidia per pixel shader has very fast floating performance and it is capable of creating the per-pixel effect with superb performance, the picture below shows something that previously wasn't possible, in real time animation.
The object is an animated, bumpy, reflective surface, this is a major feat in real time animation, and now the GeForce3 can do it. The GeForce3 can handle 4 texture passes in a single pass, it has excellent performance enables previously impossible per pixel effects on consumer level platforms.
Bump Mapping
Bump mapping creates the illusion of a extra geometry on an object such as crinkles in surfaces or bumps on a wall. it is supported in a few current games as it was introduced a while ago on Matrox graphics cards. Bump mapping creates some very realistic surface in games such as brick walls doors with much better looking surfaces.
Dot 3 Bump mapping is derived from two 2 key texture maps, the height map and the normal map, these define the surface geometry, the nfiniteFX engine can combine these two maps in one pass with all the required textures
The Nvidia nfiniteFX Pixel Shader is a big breakthrough in 3D graphics hardware, Incorporating blazing fast floating point performance and total control at the pixel level and programmable shaders. The nfiniteFX is the next step in ensuring very high image 3D image quality to the PC desktop, (lets not forget the GeForce3 on the Mac).
Well that's the techy part over for now, lets see what all this adds up to in real terms in today's games and benchmarks.