Global Illumination And Memory Compression
One of the great challenges in computer graphics remains accurate lighting. While textures and resolution have improved generation to generation, getting close to photo-realistic imagery requires in-game lighting to mimic the natural environment, where light and reflections bounce off various surfaces, textures, at varying angles, and lead to a multitude of shadows. Comparing a photo to a computer game usually reveals the latter to be compromised by relatively poor lighting effects. The simple reason for this is it's incredibly difficult to do lighting well in real time.
Mimicking light accurately can, however, be accomplished in computer graphics just as it is in real life, by calculating how millions of light-rays interact with each other and finally arrive as photons perceived by the eye, and this subset of global illumination is known as path tracing. Modern graphics cards and CPUs have become powerful enough to use path tracing to provide light-accurate images, often used in studio movies or professional applications, but the gaming industry is still some orders of magnitude away from integrating this computationally-heavy approach in games running in real time, at high resolutions, and at acceptable framerates.
And since near-perfect global illumination remains a pipe dream for games developers, various crude hacks have been introduced to offer a semblance of lighting realism. You may see options such as ambient occlusion - including horizontal-based ambient occlusion and screen-space ambient occlusion (SSAO) - present in games' menus. Considered a poor man's true global illumination, AO is used because it's quick and relatively cheap to execute on a modern graphics card.
Coming back to point, Cyril Krassin, an Nvidia employee since 2011, has helped pioneer a new form of global illumination called Voxel Cone Tracing (VCT). The aim is to improve the quality of in-game lighting without impacting seriously on the framerate. The VCT algorithm works by using 3D data structures known as voxels to capture the lighting information at every point in the scene. According to Nvidia, "the voxel can then be traced during the final rendering stage to accurately determine the effect of light bouncing around in the scene." To understand how it works in practice, head on over to this video presented at Siggraph 2011.
Crassin first demonstrated this global illumination approach running in real time, albeit very slowly, on a GeForce GTX 480 in 2006 and, later, on a GTX 680. Nvidia has since optimised it to run on GeForce GPUs and VCT-based global illumination was supposed to be a feature in Unreal Engine 4, though it was dropped due to concerns regarding the impact on framerate.
But Nvidia has added extra hardware within the GM204 Maxwell GPU to accelerate VCT further. The company claims Maxwell is able to speed-up this form of global illumination by a factor of three when compared to an older GeForce GPU, realising real-time VCT for the first time. It appears as if Epic is suitably impressed with Nvidia's efforts to reinstate VCT for UE4, likely a tickbox feature for these Maxwell GPUs.
We saw a VCT-based global illumination demo during an Nvidia event last week. Certainly significantly better than other forms of lighting, the overall effect is a solid halfway house between, say, HBAO and true path tracing. Nvidia is integrating Voxel Cone Tracing into GameWorks.
GTX 980: Numbers Don't Always Tell The Truth
Now take another look back at the spec table. GTX 980's 2,048 shaders perform much like 2,880 on Kepler, while the backend produces over 300GB/s bandwidth once the compression and cache is taken into account. Add in the doubling of the ROPs and take the peak 1,216MHz core clock into account gives us a GPU that, now, can beat out the GTX 780 in every scenario.
We mentioned earlier that Maxwell's raison d'être is to do a lot more with fewer resources. Architecture refinements translate to Maxwell being a high-performing GPU with a mainstream power requirement. GTX 980 chews through just 165W, consistent with mid-range cards of today. Nvidia knows that the GTX 980 performs better than the bigger-die GTX 780 and Ti variants in most games. It will come as no surprise that both GK110 GPUs are being discontinued from now, replaced by GTX 980 at the head of the consumer pack.
Performing at GTX 780 Ti-like levels - we're getting ahead of ourselves - the $549 (£420) recommended retail price is consistent with the cheapest Tis listed on retailers such as Newegg.
GTX 970: A Smaller Chip Off The Bigger Block
Expanding the reach of the GM204 die is the GeForce GTX 970 GPU. Performance is hampered deliberately by deactivating three of the GTX 980's 16 SMM units, dropping the total shader count to 1,664 and texture units to 104. Core frequency is reduced to a peak 1,178MHz, as well, thus hobbling the top-end of the GPU by about 20 per cent. Cuts stop there; the backend remains exactly the same, so a reasonable guess infers gaming performance to be 10-15 per cent lower than the range-topping GPU.
Having fewer cores on tap reduces power consumption even further, down to 145W, which is wholly impressive for what remains a high-end GPU. Knowing that AMD has become ultra-aggressive with the pricing of the Radeon R9 290 and 290X GPUs in recent weeks, Nvidia is using the GTX 970 to counter their price-to-performance ratio. Reference GTX 970 cards start at $329 (£250) rising to £300 for well-overclocked retail examples, so if the performance promise holds true, the lower-spec Maxwell part could set a new standard for its class. Nvidia has also decided to retire the GTX 770, too, leaving a space for future Maxwell-based GPUs to slot into.
Architecture Summary
The GM204 GPU powering the GeForce GTX 980 and GTX 970 graphics cards enables Nvidia to do more with less. Performance is gained by increasing across-the-GPU efficiencies when compared to Kepler-powered GPUs. GM204 punches well above the weight suggested by the specifications while sipping on less power than is normal for premium GPUs.
Comparing the specifications of the GM204 is akin to looking at the horsepower and torque of a sports car without taking weight and aerodynamics into account. GTX 980 and GTX 970 certainly don't have the biggest single-card engines around, but they do make up for visceral shortfall by using a nimble, elegant design.