ARM rolled out a suite of premium mobile IP in February of this year, encompassing the Cortex-A72 CPU, Mali-T880 graphics, next-generation CCI-500 interconnect and Mali-V550 and DP550 video blocks.
The suite, or parts of it, are destined to power a swathe of premium smartphones and tablets in 2016, or even earlier if ARM's partners, as is their custom, race to outdo one another for first-to-market status. MediaTek, for example, has already announced the MT8173 SoC that's powered by the new Cortex CPU cores.
ARM provided high-level details during that February announcement, deliberately leaving the finer technical details to a later date, but has since divulged juicier morsels during its annual Tech Day event held last week in London.
Cortex-A72 Explored
Mike Filippo, lead architect of Cortex-A72 and ARM Fellow, offered further insight into his team's objectives when designing the Cortex-A72 processor. The overarching remit, as Filippo explained, was to design a performance application processor that was also highly energy efficient from both power and area efficiency standpoints. Why? Because Cortex-A72 has been designed to fit into more than just mobile devices, with ARM targeting serious CPU horsepower for thermally-constrained environments often found in leading-edge automotive and server networking products.
Cortex-A72, then, has to fit into a multitude of usage scenarios, be speedy enough to stave off the burgeoning threat from Intel, yet power frugal enough to exploit new markets and opportunities. No easy task, huh? So how did Filippo's team, numbering around 60, try to achieve this balance?

Cortex-A72's architectural provenance harks back to the present Cortex-A57 that's found in a number of high-end devices. That is the end of their commonality, Filippo explained, because the new processor has a number of features that enable it to provide greater performance, in a 10 per cent smaller die area, while sipping on the equivalent level of power.
Improving upon microarchitecture from one generation to the next is a question of picking off all of the low-hanging fruit - the features that give reasonable increases - and then poring over the small details that improve energy efficiency or performance by a touch here or there. 'One per cent improvements are big news in my world,' Filippo said, giving valuable insight into the ongoing quest for marginal gains from an architect's point of view.
The Low-Hanging Fruit

Let's work our way out of the middle. Cortex-A72 shares the same three-issue, in-order front-end as the Cortex-A57 but, crucially, has a five-issue, out-of-order back end. This means, in high-level terms, that three instructions can be fed into the pipe concurrently, broken down into smaller instructions, and then processed out of the back. A wider retire block is generally considered better for improving the IPC of a core, particularly with the types of instructions and workloads presented to modern processors, so whilst there's a silicon cost to doing so, the performance advantages outweigh the negatives. This is one obvious way in which Cortex-A72 is faster than Cortex-A57.
Coming back to the front, the branch-prediction unit is always a key area for a CPU architect. Spending time and resource on this section improves the processor's ability to streamline the flow of instructions going into the fetch, decode and dispatch areas. ARM says the all-new unit is 20 per cent better than the one found on Cortex-A57 and, as an ancillary benefit, enables the overall design to be more energy efficient due to fewer mispredictions. Filippo noted that he was more than pleased with the way the branch prediction unit turned out.
The Cleaning Crew
A roster of other improvements have been implemented to reduce latency, increase bandwidth, and optimise power in every part of the design. Teams were setup to examine each portion of the core in detail, to simplify the design and to conserve power. The connecting caches were also redesigned, idle power dropped through further optimisation, and particular emphasis given on improving the core features - integer, memory, cryptography, et al - that determine straightline performance.
It struck us that ARM is pursuing an agenda of doing more with less, much in the same way, broadly speaking, Intel and Nvidia have focussed on energy efficiency before all-out performance. A very power-frugal design usually has associated benefits of a smaller die and lower overall costs of production.
Coming from a PC background and writing for enthusiasts, the Cortex-A72 is the Maxwell to Cortex-A57's Kepler - basically the same blueprint that's gone through the reworking wringer many times over. The new product is leaner, more efficient and able to address a larger part of the overall market.
Performance matters
No hardware architect is going to talk about the non-optimal stuff in their prized designs, so ARM provided a bunch of comparative data, against its own Cortex-A57 core, showing just where Cortex-A72 sits in the overall scheme.

Based at the same iso-process - which takes frequency and node out of the equation - Cortex-A72 is predictably impressive. We're used to seeing single-digit gains from one desktop chip to the next, so a 20 per cent across-the-board improvement is considered good for a high-end core.
As part of an SoC the Cortex-A57 is today found in popular platforms such as the Qualcomm Snapdragon 810 and Samsung Exynos 5433, usually in an eight-core configuration alongside the Cortex-A53. Based on the same process node and close-to-maximum frequency the Cortex-A72 is expected to be around 35 per cent faster, with some of that gain through the aforementioned architecture improvements and some by frequency that, in turn, is enabled through a more energy-efficient core.

So, yes, Cortex-A72 is smaller, faster and more energy efficient than its immediate predecessor, but we imagined no different. The real question isn't whether ARM is better than its previous architecture design, which is a given, but, rather, how does it compare against Intel's latest mobile hardware? The threat is external, not internal.
Core Blimey

The above ARM-provided slide attempts to compare the relative performance of an Intel Broadwell M-class of processor to the supposed performance of a particular Cortex-A72 configuration. There are more provisos here than any reasonable man can shake a stick at, but they do provide some guidance on the relative strength of the ARM architecture once thermal constraints, which are very much relevant in the target markets, are put in place.
ARM reckons that at one-quarter the power its four-core Cortex-A72 is a little behind a Broadwell M in single-threaded applications but ahead once all cores are tasked. Memory performance is broadly similar.
That's one heck of a claim to make, intimating your design IP remains massively more energy efficient than the very best technology from the world's largest and arguably most advanced semiconductor company. 'We can do with one watt what you can do with four,' which is impressive fighting talk, though such comparisons can be rendered moot for now due to ARM Cortex-A72 silicon not shipping for a while yet.
ARM has come out swinging with the new Cortex-A72 core that'll inevitably replace the present Cortex-A57 as the go-to choice for a number of SoC designers. Better than its predecessor in every respect, and good enough, or so say the benchmarks, to give the Intel Broadwell a bloody nose on an energy footing, we wonder how many handset and tablet makers will skip the Cortex-A57 and hurry along designs with the updated CPU core.
It's difficult not to come away impressed by the performance and energy efficiency of the next-generation CPU core from ARM. Let's now wait and see how Intel responds.

 
             
             
             
                 
                    
                 
                    
                
 
                 
                 
                 
                 
                 
                