facebook rss twitter

APU 2014 - Part II

GCN to the APU

The larger update for Kaveri comes by way of improved graphics that use the GCN architecture. The move brings consistency through AMD's GPU product stack. Kaveri harnesses the same GCN architecture as the Hawaii-based Radeon HD 290(X) GPUs.

The top-line APU's graphics can legitimately be thought of as one-quarter of those found in the muscular 290X part - Kaveri uses eight Compute Units (CU) that each are home to 64 shaders. Examine the composition in detail shows that each CU carries four 16-wide SIMD vector blocks, plus a scalar unit and registers in the middle, and a general, per-core scheduler at the top.

A total of 512 shaders and use of GCN architecture offers peak graphics performance that is up to 25 per cent faster than on the Richland APU, according to AMD, though the average frame-rate benefit is likely to be closer to 15 per cent, even with the slower GPU clock of 720MHz in mind. Looking outside the GPU block, Kaveri bakes in AMD's TrueAudio technology and the latest UVD block, as well.

But there is a key addition for Kaveri over discrete GPUs using the Hawaii architecture: shared coherent unified memory. Memory coherency is a means by which to keep a shared memory pool updated when more than one processor is working from it, to ensure that the data remains current when changes are made that would otherwise be oblivious to other cores. Kaveri's trick is in having full memory coherency between the GPU and CPU cores for the first time on a APU. This leads us nicely on to Heterogeneous System Architecture (HSA).

HSA - bringing it all together

Changes made to the last-generation Trinity architecture provided a means by which the CPU and GPU cores could work far more closely than with Llano. AMD's engineers added an input/output memory management unit (IOMMU) that enabled the GPU portion of the chip to access the virtual address space, laying the foundation for sharing virtual addresses with the CPU.

"Kaveri" takes this sharing a step or two further by adding a second bus between the IOMMU and GPU, thereby offering the above-mentioned coherency between the CPU and GPU, and secondly, adding a feature called platform-level atomics, whose job it is to synchronise the workloads between all the cores, be they CPU or GPU.

Memory coherency and synchronisation - the key HSA features that Kaveri supports - are needed to make the GPU portion of the APU an equal partner to the CPU. For example, under HSA, the GPU doesn't need to have data copied before accessing it, can now access the same address space, and can take a peek into what the CPU is working at.

The key benefit is that the GPU has the same level of overall system access previously enjoyed solely by the CPU - HSA rights an historical wrong that has previously relegated it to an also-ran device in terms of programming. Really, the GPU is arguably more important than the CPU in a modern APU. In sum, HSA brings a cleaner way of apportioning workload to either the CPU or GPU through easier programmability, which is why AMD claims 12 compute cores (4 CPU and 8 GPU) for Kaveri.

Exploiting the efficiencies of HSA requires that software be aware of the easier-to-access parallelism, meaning that existing code needs to be reworked; AMD says that the HSA Foundation is working with many industry-standard languages to enable HSA acceleration. Java support, for example, is due in 2015.

The pragmatism - models, socket, pricing, performance, etc.

The fourth-generation A-Series Kaveri APUs use the FM2+ socket that is widely available on motherboards from all major manufacturers today. At launch, only two models will be present, the A10-7850K and A10-7700K, priced at $173 and $152, respectively. An A8-7600 will follow later.

Note that only the top-line part has the full complement of graphics CUs, as the A10-7700K drops the total number of shaders from 512 to 384. The lesser A10 is also slower with respect to CPU speed but still shares the same 95W TDP. We'd recommend most users ponying up the extra $21 and going for the faster part. Purchase either A10 and AMD includes a copy of Battlefield 4, downloadable via a free coupon code.

Those looking for small-form-factor systems could consider the A8-7600, or use the Configurable TDP option of the new A10 APUs to run at 65W or even 45W - which is well-suited for a SFF Mini ITX system.

Click here to read more about AMD APU