Architecture Improvements
Haswell's primary ambition may be to drive down power consumption, but it wouldn't be a new Intel architecture without performance improvements on both the CPU and GPU fronts.
Knowing that Haswell borrows heavily from previous-generation Intel architectures, it's no surprise to see that common features such as Hyper-Threading, Turbo Boost and IGP-enabled QuickSync are all still here. This is very much a case of evolution as opposed to revolution, and underneath the hood, Haswell merely builds on the foundations laid by earlier Core microarchitectures.
Let's start with a very basic refresh. In simple terms, remember that the CPU's goal is to fetch instructions in order, decode them into micro-ops (uops), execute them in any order and then commit them to memory. This instruction cycle is integral to the basic workings of any modern computer, and processors employ a technique called pipelining to boost instruction throughput.
The pipeline isn't a physical structure in the obvious sense of the term, but rather a specialised set of hardware and queues designed to allow the chip to process multiple instructions at once, through the pipe. Conroe (2006) made use of a 14-stage pipeline, Nehalem (2009) saw the pipeline deepened to 16 stages, and in Sandy Bridge (2011) Intel introduced a decoded uop cache to make the current 14-stage pipeline more efficient than ever before.
The front-end of the pipeline is tasked with fetching instructions and providing work for the back-end, and up front, Haswell offers no major changes, with the bulk of the upgrades taking place in the back-end where micro-ops are executed.
One of Haswell's most striking enhancements is that the number of back-end execution ports has been upgraded from six to eight. This is the first increase in execution ports we've seen since Conroe, and the new additions give Haswell a fourth port for math logic and a fourth for store address calculations.
Ensuring that the wider back-end is well fed, Intel has also made further branch prediction improvements to ensure fewer cache misses, which can be very problematic with respect to pipeline efficiency. As you might expect, faster buffers and faster access to L1 memory are also a priority. Each Haswell core continues to be served exclusively by a 64KB L1 cache and a 256KB L2 cache, with a third last-level cache of up to 8MB shared across all cores, the graphics processor and system agent, just like Ivy Bridge.
This time around, L1 store bandwidth is doubled from 16 bytes/cycle to 32 bytes/cycle, and interface bandwidth between the L1 and L2 caches has doubled from 32 bytes/cycle to 64 bytes/cycle. Although Haswell is officially described as new micro-architecture, it's clear that it has a lot in common with earlier generations and the focus has been on getting the most out of a tried-and-trusted system. Again, that word efficiency crops up again.
Making the execution ports all the more potent, Intel's also taking this opportunity to roll out its AVX2 instruction set, with support for Fused Multiply-Add (FMA). Tapping in to these new capabilities, execution ports 0 and 1 are equipped with 256-bit FMA units, allowing them to execute multiply and addition in a single calculation.
This will, in theory, double the peak floating-point throughput of Haswell, but as with most instruction set enhancements, the real-world performance increase will rely heavily on optimised code.
Intel has gone to great lengths to make the Haswell architecture more efficient, however increasing performance has become less of a priority. Having maintained a comfortable lead over AMD in recent years, there's little need to ramp-up performance dramatically, and as things stand, consumer Haswell chips won't exceed the dual- or quad-core configurations we've become accustomed to. Intel could, quite easily, launch bigger, faster processors, but focus has shifted and, while internal performance improvements are being made, efficiency really is the real name of the game.
Case in point, Intel is keen to emphasise a 'better user experience.' Instead of rolling out benchmarks and record performance numbers, as has historically been the case, Haswell is best described as a faster, leaner architecture that Intel hopes will enable it to thrive in modern-day form factors.
Though, on the subject of performance, there is one area in which Intel is promising massive gains...