facebook rss twitter

Review: AMD Radeon HD 7970 3GB

by Tarinder Sandhu on 22 December 2011, 05:00 4.0

Tags: AMD (NYSE:AMD)

Quick Link: HEXUS.net/qabaih

Add to My Vault: x

A tweak here, a tweak there

Zooming back out - feeding the beast up top

There's little point in having a clean, efficient core if it can't be fed quickly. At the top end of the GPU, AMD's beefed-up the tessellation engine again; it was reworked from Radeon HD 5-series to Radeon HD 6-series, too. While the latter introduced two fully-fledged tessellation engines that provide genuinely more tessellation throughput than before, AMD goes at it for a second time.

This time around, the two engines have been further tweaked with off-chip buffering improvements and vertex re-use to provide, in a real-world sense, up to double the tessellation throughput. Games that rely on explosive tessellation will take to Tahiti, and AMD cites examples such as Shogun: Total War 2, Crysis 2 and Lost Planet 2 as being particularly partial to tessellation love.

All this code is then used by two asynchronous engines that work in conjunction with the main command processor. Just like Cayman, dual DMA engines enact a fast link to the actual system, facilitated by the next-gen PCIe 3.0 interface. Lots of a bandwidth - in and out - for a bandwidth-focussed GPU.

And like Cayman and a further nod towards general compute, double-precision rate is one-quarter of full speed, so almost 1TFLOPS DP, and there's full ECC support, too.

Feeding the beast down below

Eight render back-ends house four ROPs each, leading to a fairly standard 32, though the ratio between ROPs and memory interfaces is now lopsided - there are now '1.33' ROPs to each 64-bit link compared to a 2:1 ratio on Cayman. Each ROP can process a colour pixel per clock cycle, so there may be occasion where the total throughput is found wanting, given just how much power there is 'above' and 'below' the ROPs.

A 384-bit memory subsystem is expensive to implement from an engineering standpoint - power and complexity both go up - but that's AMD's problem, not yours, and we're impressed that Tahiti keeps a stock memory frequency of 5,500MHz.

I'm not SAD, Radeon HD 7970 is

Segueing away from architecture and hidden away on the chip is a dedicated piece of logic used for evaluating sum of absolute differences, or SAD for short. AMD bothers to do this because the SAD algorithm is vitally important for a bunch of media-related tasks. Put simply, SAD works by comparing the differences in one pixel to another, over and over and over again.

By noting just how alike (or different) two pixels are and in time calculating the difference between two full-screen, megapixel images, SAD can be used to speed-up common tasks such as gesture recognition, motion estimation and depth extraction, which all rely on pulling meaningful information from frames by comparing them. Radeon HD 7970 is reckoned to process 7.6 terapixels per second. We'll know more when we have a chance to run it against some common apps.

ZeroCore power

We've told you a heap about how the Radeon HD 7970 Tahiti card works. The most-impressive metric, we believe, is just how much processing power AMD packs into a GPU that draws, on average, no more than 250W, though this figure can be adjusted in the PowerTune options of Catalyst Control Centre.

But idle power-draw is equally important for most people. GPGPU users may stress the card at all times but most gamers, no matter how avid, will have longer idle periods. A powerful card needs to be frugal when simply rendering 2D work. To this end, AMD's made a bunch of power-gating technologies available on the GPU, dropping idle power to around 15W from 20W or so on Cayman.

Going further, and rather cleverly, AMD introduces an advanced power-saving mode when the PC's in what's termed a long idle state. This occurs when the framebuffer isn't being updated - monitor often switches off - and there's little need to keep even the most basic GPU blocks active. In such a situation, AMD's ZeroCore power, controlled by the driver, switches off practically all the GPU - 3D engine, memory interfaces, etc. - and leaves just a small block running whose job it is to remind the operating system that a GPU is still in the system.

This ZeroCore power, once applied by the driver, drops idle draw to 2.6W, or low enough for the fan to switch off completely. Extending this farther, additional cards in a CrossFire-enabled system can also be turned practically 'off' through the same procedure; they only wake up once the system fires up a game. Of course, the operating system needs to be efficient enough not to switch on a GPU when in a long idle state, especially when various widgets are running in the background, but it seems as if Windows 7 does a good job of idling down.

Architecture summary

We could write more, lots more, on the architecture, and we may take the opportunity later. But if you want a few key takeaways without reading thousands of words, let it be this: the AMD Radeon HD 7970 GPU packs in considerably more transistors and processing power into a GPU that draws the same under-load power as a Radeon HD 6970.

Made possible by the switch down to a 28nm process and further enhanced by a radical shift towards a GPGPU-optimised architecture, along with a more-refined tessellation engine and increased discrete memory and internal L1 and L2 cache bandwidth, there is little reason to suspect it won't become the fastest single-GPU card around.

The price

OK, so we got this far without explicitly mentioning the oft-thorny subject of price. AMD reckons the initial batch of Radeon HD 7970 GPUs will retail for $549. If we do a back-of-the-envelope calculation and convert dollars to pounds and add on the dreaded VAT, the card should come in at below Ā£450. Whatever the case, the benchmarks will inform us whether this pricing represents folly or real-world pragmatism by AMD.