facebook rss twitter

Review: AMD's Kaveri APU examined

by Tarinder Sandhu on 14 January 2014, 13:00


Quick Link: HEXUS.net/qab7kr

Add to My Vault: x

A new CPU

Abridged history of APUs

AMD has been steadily improving its premium Accelerated Processing Units (APUs) since the inception of the Llano processor in June 2011. The first-generation APUs integrated AMD's K10 CPU and Radeon HD 5000-series discrete graphics on to a monolithic die, thus enabling mainstream desktop and laptops to be powered by a single processor.

A major APU update, codenamed Trinity, arrived almost a year later, this time imbued with updated technology for both the CPU and GPU in the form of Piledriver cores and Radeon HD 6000 graphics, respectively, though the newer CPU architecture was often slower than the one it replaced. AMD, however, cemented its position as provider of best-in-class graphics through improvements to the GPU. Moving on another year to Richland, considered a minor refresh, AMD's arguably kept ahead of Intel's recent APU-like Core i3 and Core i5 processors in the all-important bang-for-buck-metric... but the gap is closing.

The newest APU technology now resides in 'Kaveri'-based chips announced at CES last week. This time around and keeping up with the times, AMD fundamentally upgrades the graphics portion of the APU to the GCN architecture found in all the latest discrete Radeon GPUs and consoles whilst making incremental improvements to the CPU cores.

Brief APU comparison

APU Model
CPU Cores
CPU Tech
Max CPU Clock
GPU Cores
GPU Tech
Max GPU Clock
AMD Turbo Core
Form Factor
Radeon R7
HD 6000
HD 6000
HD 5000

The high-level overview shows the key performance attributes of each AMD APU series. Let's take the improvements turn by turn and thus evaluate whether Kaveri APUs offer a worthwhile upgrade over last-generation Richland.

28nm, does it matter?

AMD's move down to a specific 28nm fabrication process has ramifications for the Kaveri APU beyond that of a smaller die. Joe Macri of AMD explained that previous APUs used silicon that was designed for frequency above denseness, a vestige of CPU design, thus optimising for MHz above parallelism by using speedy, low-metallised transistors. Now, as the GPU becomes more important - 47 per cent of the Kaveri die is devoted to it - and power is of greater concern, AMD, in conjunction with GlobalFoundries, is using an 'APU-optimised' process that offers a better compromise between all-out speed and ability to make the APU's compute more parallel.

There are two key upshots from this. Firstly, the need to find a happy medium between performance, power and parallelism means this 28nm Super-High-Performance (SHP) process doesn't have the ability to scale the cores as high as on previous APUs. We can see this by looking at the maximum speeds of both; the peak frequencies of the CPU and GPU parts is lower than Richland on a roughly-equivalent TDP. But secondly, use of 28nm SHP also allows AMD to shoehorn 512 graphics cores, which is comfortably higher than on any previous all-in-one processor. AMD's adamant that this balanced design and wide dynamic range - the architecture has to fit into 15-95W TDPs - wouldn't have been possible without the substantial tweaking undertaken here.

And it's a big chip, too, weighing in at 2.41bn transistors, or over 1bn more than Trinity/Richland that it replaces. The AMD APUs share a die size of around 245mm², so not only does the 28nm process offer improvements in terms of gaining parallelism, it is very much needed in order to keep manufacturing costs sensible. As you can imagine, most of this extra transistor budget is for the graphics cores.

The steamin' CPU cores

It is normal for AMD to harness the latest CPU technology present in discrete, standalone CPUs and use it in subsequent APUs. Trouble is, there is no new technology on this front, with AMD's newest FX line of consumer CPUs still using the maligned Piledriver cores. Worse still, they won't be upgraded until 2015 at the very earliest, intimating a tacit understanding that development has truly stalled on this front. This then leads AMD to take the rather unusual step of debuting new CPU tech on an APU - Kaveri is the first chip to use the Steamroller core.

Steamroller is an enhanced version of the Piledriver core, just as that was when compared to the original Bulldozer found in the first-generation FX chips. The basic architecture topology remains intact, but AMD has made some key changes with respect to efficiency, particularly at the fetch and decode stages of the pipeline, in an effort to boost throughput by reducing bottlenecks and stalling at the start of the compute process.

Getting more granular, the instruction cache has been boosted by 50 per cent, to 96KB, reducing misses by up to 30 per cent. The extra cost of silicon is worth it, says AMD, because misses here really hamper pipeline execution. Missing branches are also costly when processors become more parallel, so AMD doubles the branch target buffer. The scheduler, too, is improved, with Steamroller upping Piledriver's 40 entries to 48. More is better because a wider scheduler enables the chip to be fed with instructions to a higher degree - efficiency by a different name.

There are also two distinct integer schedulers and ability to issue two stores at once, compared to one in the previous generation on each count. Looking at the backend, access to memory is improved by deepening the queues for load/stores, meaning that Steamroller can jump between main memory and the chip's registers more quickly than either Richland or Trinity.

What does all of this mean in terms of real-world processing? AMD believes the improvements add up to an average 10 per cent uplift over Piledriver-based Richland in instructions-per-cycle (IPC) throughput, peaking at 20 per cent for best-case scenarios. The uptick in performance is about what we expected from a revised, enhanced core, but AMD will continue to play catch-up to Intel's superior Haswell CPU architecture for some time to come: Steamroller isn't a silver bullet, it's a logical evolution of a below-par core.