Overview
This HEXUS.help guide gives a brief overview of Intel’s new ’Nehalem’ generation of Central Processing Units (CPUs).
The technology and how it works
Intel's Nehalem design can be seen as an evolution over the current Core2 design, with a number of existing bottlenecks remedied.
Improved Input/Output
The memory controller has been repositioned from the northbridge into the CPU, reducing memory-access latency and doubling available bandwidth compared to dual-channel Core 2 CPUs. This has been done by removing the bottleneck imposed by the front-side bus, where any information to and from the CPU had to be sent through a separate chip on the motherboard.
As if that were not enough, the highest-end parts will also move from a dual- to triple-channel memory, meaning that memory bandwidth, an important determinant of overall performance, has been increased by over 50 per cent compared to what's available from Intel's consumer CPUs today.
System-wide communication has also been targeted for improvement, with the high-speed, serial Quick Path Interconnect (QPI) replacing the FSB in communicating with the northbridge and other processors in multi-socket systems, although this affects the professional and server product lines rather than consumer, with high-end desktops the only consumer parts to use QPI.
If a lot of these features sound familiar, it's because they are. Rival AMD's CPUs have featured an integrated memory controller and high-speed serial interface (HyperTransport) since the launch of the K8-generation Opteron and Athlon 64 products in 2003.
The similarities don't stop there AMD's current K10 Opteron and Phenom X4 products feature a three-level cache hierarchy - a feature Nehalem also adopts with small yet fast 64KB L1 and 256KB L2 caches dedicated per core, and a slower but vastly larger - 8MB in initially shipping products - L3 cache shared between the cores.
With the design optimised for highly multithreaded environments, servers and high-performance computing look to benefit the most. For general desktop use the higher cache latencies and lower per-core capacity may partly mitigate the multithreaded performance improvements.
Core Improvements
The aforementioned design changes focussed on removing bottlenecks in feeding the processing core with information, but Intel has also paid attention to improving the processing cores themselves, beyond the already-class-leading Core2.
The design focus is on maximising efficiency, and Hyper Threading has been re-introduced from the Pentium 4 days. Whilst the implementation isn't quite the same, the overall effect is, bringing eight-thread computing to a quad-core Nehalem processor.
Further, macro-Op fusion is expanded to cover 64bit instructions, and a new Power Control Unit (PCU) looks to make the most effective use of processor resources.
Hyper Threading - or Simultaneous Hyper Threading as it is known for Nehalem - allows instructions to be drawn from two threads simultaneously. A four-core Nehalem appears to the operating system as an eight-core processor, with each physical processor core appearing as two virtual cores.
It is reckoned that the performance increase due to Hyper Threading is greater than the relative increases in die size or power consumption, meaning the chip does more per clock than existing Core 2 or, for that matter, AMD's Phenom X4.
The integrated Power Control Unit (PCU) dynamically monitors and adjusts the clock speeds and voltages of each processing core. Cores can be clocked independently, or shut off entirely when not in use. Further to reducing clock speeds and voltages when cores are idle, a new 'Turbo' mode will allow clockspeeds to increase beyond the rated frequencies when the processor is within the rated thermal and power envelopes, increasing single- or multi-threaded performance. These changes will increase energy efficiency whilst maintaining performance-on-demand, we reckon.
The market
The Nehalem family will span desktops, workstations, servers and notebooks, and will be available, by December 2008, in both ready-built systems and as standalone products for self-builds.
The first products to launch will be the Core i7 desktop line, based on the 'Bloomfield' core, targeting high-end performance systems. A server part will also be released this year, and d it is currently codenamed 'Nehalem-EX'.
Later, in 2009, the Nehalem family will expand with the mainstream quad-core, SMT-enabled 'Lynnfield' and dual-core, four-thread-capable 'Havendale' desktop CPUs, along with their notebook counterparts 'Clarksfield' and 'Auburndale,' respectively.
These lower-cost, mainstream parts lose the Quick Path interface - communicating directly with the core logic via a DMI (Direct Media Interface), integrate the PCIe controller on-die, and reduce the memory channels to from three to two. There will also be variants of 'Havendale' and 'Auburndale' with integrated graphics cores, and the graphics portion will ship on what is called a multi-chip module, where two dies are put on the same CPU package, much like Intel's Core 2 Quad today.
Later, this will be followed by the 'Nehalem-EX' part for the expandable server market, offering eight-core (16 threads) compute ability, a quad-channel FB-DIMM memory controller and a mammoth 24MB L3 cache - thereby completing Intel's rollout of the Nehalem family across product lines.
The players and competition
Intel's biggest competitor in the CPU arena is AMD, whose Opteron series competes in the server and workstation spaces; Phenom line in the desktop; and Turion X2 line in notebooks.
Nehalem, then, will be squaring off with both the current 65nm 'Agena' core as well as its forthcoming 45nm 'Deneb' successor. It is expected that AMD, whilst not competing in terms of absolute performance, will be competitive, cost for cost, in the lower-price ranges.
Since the introduction of the Core 2 processor Intel has been following a tick/tock strategy, with the tick representing the shrink of an existing architecture, plus a few enhancements, to a new process node, and the tock being the introduction of a new architecture on the now-proven manufacturing process.
As such, Nehalem will not only be competing with products from AMD but also the present Core 2-based Penryn architecture that underpins most of the shipping Intel desktop and server CPUs today.
Summary
Intel's Nehalem family of processors addresses a number of weaknesses found in the Core 2 architecture - especially in terms of scalability and input/output performance, whilst continuing the drive for ever-higher performance-per-watt.
However the majority of the changes are targeted towards the heavily multi-threaded workloads of workstations and servers, and may prove less of a benefit in the consumer space.
The Nehalem family, when formally released at the end of the year, looks to offer the highest-performance CPUs available, yet both AMD's Phenom, and Intel's current Core 2 families will continue to be attractive to certain segments of the market for some time to come.