Technology II
So in terms of maximising your effective memory bandwidth, there's a lot that 'can' be done and a lot that actually 'is' done.
Starting at the lowest level with the interface between the GPU and card memory itself, the memory controller, both Radeon 9700 Pro and GeForce4 Ti make use of a memory crossbar that serves to optimise reads and writes too and from card memory from the GPU.
The crossbar splits the memory bus up into manageable chunks. We'll talk about the Radeon 9700 Pro case since that's the card being reviewed, but apply the same concepts to the GeForce4 Ti because NVIDIA do a lot of similar stuff with their crossbar.
The Radeon 9700 Pro has a 256-bit memory bus width but often the card wont have 256 bits of data to move out or back in from memory per clock cycle. On a normal controller without something like a memory crossbar, the controller would have to wait until there was 256 bits of information to transfer before the transfer could take place. If that were to take 8 memory clock cycles, that's 8 clocks of latency added to the memory controller before anything happened.
So the crossbar splits the memory bus into chunks (channels in ATI terminology) which in the Radeon 9700 Pro's case are 4 chunks, each 64-bits wide. So what happens in this case is that any multiple of 64 bits up to 256 bits of data (64, 128, 192, 256) that is sitting on the memory controller per clock cycle can be moved to and from card memory. So you negate the need to wait for a full 256 bits of data to become available before anything can happen, reducing memory controller latency and therefore increasing effective memory bandwidth. The larger the memory bus, the more the need for a crossbar or other optimising method for memory accesses.
Further up the low level tree, actually on card memory itself, the GPU works to limit overdraw. Overdraw is when objects in a drawn scene or frame that you see on your screen are drawn to the framebuffer even though you as the viewer with your view of the frame can't see that object. Think of it in terms of putting your hand over your left eye, your hand occludes (blocks) what's behind it and you can't see it but it's still there. This happens on the GPU, it draws information that you can't see, wasted information.
So the Radeon 9700 Pro uses a set of technologies to reduce overdraw and hence have to draw less to make a visible frame and therefore make better use of that peak memory bandwidth figure by not drawing redundant data.
Called HyperZ because they operate on the Z-buffer (depth buffer), they include Z-buffer compression where the contents of the depth buffer are compressed without loosing information so that less card memory is used per frame. This has the knock on effect (like all these technologies) of reducing stale data being moved to and from card memory, increasing effective bandwidth.
Fast Z clear allows the GPU to quickly reset the Z-buffer with zero values before it's needed again. This is a costly operation, especially when the bus width is large so any technique to help here is valuable.
Lastly we have Hierarchical Z and Early Z and both operate on blocks of pixels or individual pixels to discard them before the final write to the framebuffer and on to your display device.
So HyperZ is a set of tricks to do some testing of the Z-buffer to stop pixels being drawn to the framebuffer and also some optimation techniques to get the most out of the data in the Z-buffer, again before it gets drawn.
So with the GPU horsepower we discussed on the page before and the memory technologies on display on this page, you can see where the Radeon 9700 Pro gets its power.
Hopefully you can also envisage some situations where the card may not be that much faster than Ti4600 and also see where the card really pulls ahead of Ti4600 and make your hard earned money go to work.
It's all about whether the faster situations are what you'll be using with Radeon 9700 Pro and to make the purchase over Ti4600 worth it since Ti4600 (and the other members of the GeForce4 Ti family) are still fast accelerators.
A quick word on what features we get on the GPU and then a look at how Sapphire does things with the card.