Broadwell-EP uncovered
Technology giant Intel has multiple processors for the mobile, client and server segments. Each is home to various chips based on recent architectures, and these include Skylake, Broadwell and Haswell in reverse chronological order spanning the last three years.
What's interesting is that Intel uses different architectures for different segments. The consumer desktop and mobile line is spearheaded by the sixth-generation Skylake architecture whereas, until yesterday, the premium server line, known as Xeon E5, was powered by the fourth-generation Haswell. The slower pace of architecture evolution for the server and workstation can be explained by the two-fold reasoning of Intel having very little genuine competition in the area and a need for platforms that last multiple years.
Today, though, Intel is changing it up by releasing a slew of new Xeon E5 chips equipped with Broadwell technology. They aren't the first Xeons to do so - that accolade goes to the SoC-based Xeon D and Xeon E3 v4 from last year - but by updating to Broadwell across its most popular mainstream processors Intel is adding more performance and more cores for the same financial outlay.
The simplest way to examine what's new is to look at the high-level improvements in the architecture as it pertains to Xeon E5 v3 (Haswell) and E5 v4 (Broadwell).
Haswell-EP vs. Broadwell-EP high-level overview
Broadwell-EP, to give the new processors their full codename, use the same Socket R3 as their Haswell predecessors. This means that the E5-2600 v4 processors will be a drop-in upgrade once the latest BIOS has been installed on an Intel C610-based motherboard. The first digit, which remains '2' in this case, denotes that up to two processors can be installed on to a single motherboard. Intel will launch the E5-4xxx v4 chips in due course.
A benefit of Broadwell's 14nm manufacturing process is a smaller die and, one would assume, lower cost of manufacturing should yields be similar to Haswell. Putting this into numbers, a Hasewll-EP E5-26xx v3 chip is 661mm² and packs in 5.56bn transistors. A similar Broadwell-EP E5-26xx v4 processor brings this down to 455mm² and 7.2bn transistors, translating into a healthy die-space saving for Intel.
The best of the 28 Haswell-EPs is the E5-2699 v3, housing 18 cores and able to process 36 threads. The 145W chip operates at 2.3GHz and has a complicated range of Turbo bins that scale up to 3.6GHz for regular applications and 3.3GHz for harder-hitting AVX workloads. Meanwhile, the best of the Broadwell-EP breed, the E5-2699 v4, packs in 22 cores and 44 threads, so more compute power, but has a slightly lower peak AVX speed. We'll discuss model-to-model differences on the following page.
Enhancements in pure architecture offer a ballpark five-to-10 per cent performance increase for Broadwell-EP, according to Intel's own internal numbers, with the exact gain dependant upon how well the application taps into the Broadwell blueprint. Part of this gain is also attributable to Intel increasing the officially supported memory speed to 2,400MHz, up from 2,133MHz, though just as with Haswell, Broadwell's memory controller runs slower if three DIMMs are used per channel.
In total Intel is launching almost 30 E5-2699 v4 Broadwell-EP processors that offer a modicum of extra performance when compared to the incumbents on a core-to-core basis while, at the top end, include more cores and threads than ever before.
Broadwell-EP architecture improvements, additional software
Readers already familiar with the desktop Broadwell processor will have a good idea what the new architecture brings to the table over and above Haswell, but it's worth reiterating for these mainstream workstation and server chips.
The new features, highlighted on the left, are well-known. What's more relevant is the extras that Broadwell brings to the table. Intel keeps the same 2.5x ratio of cores to cache, meaning the 22-core part is equipped with a substantial 55MB of last-level cache. There's support for through-silicon via (TSV) DIMMs, more commonly known as 3DS, and as an extra memory reliability through CRC. It's not usually needed in servers, but is useful if there's a high soft error rate.
Architecturally, the ring topology that connects cores to memory and PCIe remains intact, and it takes the form of dual rings on the high-core-count processors, 1.5 rings on the medium-core-count processors and a single ring on the low-core-count ones.