Core blimey. There be changes afoot
Core blimey! It's 4D world now
AMD has reorganised the core layout of Cayman, too. Heading back a generation to Cypress, the Radeon HD 5870 is equipped with 20 SIMD arrays that are home to 16 units. Each unit, for want of a better word, is composed of what is known as a Very Long Instruction Word (VLIW) setup. These VLIWs are designed to take advantage of the parallelism that exists in GPU architectures by pushing multiple instructions to multiple cores, concurrently.
Cypress has a VLIW5 setup, meaning that each unit has five stream cores. Do the math and Radeon HD 5870 has (20x16x5) 1,600 shaders. AMD, though, has changed this to a VLIW4(D) setup, as shown on the top slide. Now each of the four Cayman cores can process the same instructions, rather than just rely on the fifth 'fatter' core on Cypress for the 32-bit multiply-adds. In effect, AMD has farmed out the FP MAD to the other cores, and it reckons that doing it this way provides a little better efficiency and, with respect to silicon costs, a reduction in the per-core die size.
But it's not a gaming GPU...honest
Using a graphics chips for general-purpose work is a burgeoning segment. The sheer horsepower on tap can, in many cases, make it beneficial to run compute calculations on the GPU rather than CPU. This is why both AMD and NVIDIA have professional cards aimed at this market, yet they're ostensibly designed around the base gaming GPU.
NVIDIA has made decent strides in the professional community with its professional cards. AMD wants in on this lucrative business. Cayman ships with a more-complete GPGPU specification than Cypress. There's now the ability to work on different types of instructions concurrently - see, Cayman borrows from Fermi's playbook again- faster reads/writes and, significantly for applications that need greater accuracy in results, AMD's VLIW4 arrangement boosts double-precision (64-bit FP) from 1/5th single-precision speed to 1/4th. This is a big deal for certain markets, but won't make a jot of difference to Gary the Gamer. The HPC crowd, though, may well shun it because Cayman, unlike Fermi, doesn't support ECC memory.
We've made the way down the last portion of the GPU, before the instructions are spat out to the frame-buffer and display.
What you need to know here is that AMD claims to have improved the texture-filtering speed for both INT8 and HDR formats on Cayman, as compared to Cypress. However, compared to NVIDIA's GTX 500-series, Cayman still filters FP16 textures at half INT8 speed, whereas NVIDIA's cards can run it at full chat. This will be telling in cases where the game uses high-dynamic range effects.
The last Cayman-only addition of note is what AMD terms Enhanced Quality AntiAliasing (EQAA). We all want better image quality through enhanced filtering but don't want to pay the computational tax for it. EQAA attempts to increase the quality of edge filtering by including what are known as coverage samples rather than additional colour samples. Increasing the number of colour samples is painful for the GPU, as it needs to store their results in memory, putting the onus on the card's memory-bandwidth and buffer. This is especially true if using high-dynamic range formats.
Adding in coverage samples around the colour samples increases image fidelity - you've got more samples, right? - without overburdening the GPU's memory. This is possible because, while not absolutely free, coverage samples don't store the memory-filling colour and Z samples associated with regular AA. Bottom line: you get more IQ for a slight performance degradation.
Reckon you've heard all this before? You have, because NVIDIA introduced Coverage Sample AntiAliasing (CSAA), which is essentially the same edge-filtering improvement, in the GeForce 8-series GPUs.