Faster Northwood and another Extreme CPU
Intel Pentium 4 3.4GHz Extreme EditionIn contrast to the Prescott, the Pentium 4 3.4GHz Extreme Edition is a known quantity. Intel's design brief was to create an enhanced Northwood processor without resorting to any fundamental architectural changes. That precluded switching manufacturing processes and altering, like the Prescott, the pipeline length. Intel came up with the expected solution, which was to bump up the CPU's on-chip cache. In another oversimplification, a CPU spends most of its time idling around whilst system memory gets its acts together and provides the data that has been requested. We reviewed some Corsair XMS4400 memory recently. It ran at 550MHz DDR, and that's about as fast as you can currently buy in quantity. The Pentium 4 runs in excess of 3GHz. There's more to this simple equation than this, but we fully understand that having to retrieve data from main system memory is pathetically slow in CPU terms. Adding in more CPU cache, therefore, is an obvious, if expensive, method of speeding up the whole data acquisition process.
The regular Pentium 4 Northwood uses 8kb of L1 data cache nearest to the CPU, 12kuops of L1 execution cache (whose contents don't require decoding into the CPU's language - a nice architectural touch), and 512kb of L2 cache. The question may be asked as to why CPU manufacturers just don't lump, say, 64MB of cache directly on to the CPU. We can't argue that it would help performance matters a great deal. However, consider this. The Pentium 4 Northwood carries around 55-million transistors. The Pentium 4 Prescott, with 1MB of L2 cache and 16kb if L1 cache, is reckoned to weigh in at around 125-million transistors. Cache adds extreme cost in die size and transistor usage. The CPU design team need to evaluate the trade-off between cost and performance.
The Pentium 4 Extreme Edition was engineered without cost being the primary concern. Therefore 2MB of L3 cache has been added to the die. An aside, L3 cache isn't quite as fast as the cache closer to the core, but still way, way faster than having to drag the data contents from main memory. 2MB of cache increase the CPU's ability in scenarios where quick and repeated access to memory is required. That sounds like gaming to use. Load in the dataset, or a large portion of it, into the CPU's cache and watch it fly. It's reckoned the transistor count sits at around 170-million transistors with a 237mm² die size. It's big, it's fast, and it's expensive - it's often referred to as the Expensive Edition.
Other than the cache issue and 3.4GHz operating speed, there's little to separate it from the rest of the Northwood set of CPUs.
A Gallatin Xeon that's been spanked into Northwood form. A review of the original 3.2GHz Extreme Edition CPU can be found here
The 3.4GHz Extreme Edition on the right. The 3.2GHz regular Northwood on the left. You wouldn't think that one is 3x the price of the other.
Intel Pentium 4 3.4GHz Northwood CPU
Coming to the end of its reign as Intel's favoured consumer-level CPU is Northwood core. It's served Intel well, running up from a 1.6GHz 100MHz version to the present 3.4GHz 200MHz variant. Overall performance has improved with the introduction of dual-channel chipsets that give the quad-pumped Pentium 4 much-needed bandwidth. 3.4GHz will be the Northwood's final outing. Intel has decided that it won't carry the Northwood core over to the revised LGA775 packaging for the next iteration of Prescotts. It, however, features the same attributes that have made the Northwood such a popular CPU. Further Northwood information can be found in our review here
We'll add a brief table that highlights the processors' qualities
Name | Pentium 4 3.2GHz Northwood | Pentium 4 3.4GHz Northwood | Pentium 4 3.4Ghz Extreme Edition | Pentium 4 3.2GHz Prescott | AMD Athlon 64 Model 3400+ | AMD Athlon 64 FX-51 |
Clock speed | 3200MHz | 3400MHz | 3400MHz | 3200MHz | 2200MHz | 2200MHz |
L1 cache | 20kb* | 20kb | 20kb | 28kb* | 128kb | 128kb |
L2 cache | 512kb | 512kb | 512kb | 1024kb | 1024kb | 1024kb |
L3 cache | - | - | 2048kb | - | - | - |
FSB | 200MHz quad pumped | 200MHz quad pumped | 200MHz quad pumped | 200MHz quad pumped | 2200MHz (core speed) | 2200MHz (core speed) |
Pipeline length | 20 stages | 20 stages | 20 stages | 31 stages | 12 stages | 12 stages |
Socket | S478 | S478 | S478 | S478 (for now) | S754 | S940 |
Manufacturing process | 0.13-micron | 0.13-micron | 0.13-micron | 0.09-micron | 0.13-micron | 0.13-micron |
Transistor count | 55 million | 55 million | 169 million | 125 million | 106 million | 106 million |
CPU die size | 127mm² | 127mm² | 237mm² | 112mm² | 193mm² | 193mm² |
Voltage | ~ 1.525v* | ~1.525v | ~ 1.525v | 1.25 - 1.4v | 1.5v | 1.5v |
Memory support (now) | DDR400 DC | DDR400 DC | DDR400 DC | DDR400 DC | DDR400 SC | DDR400 DC ECC |
Other | HT Tech | Last NW | 2MB L3 cache | SSE3 | On-die controller | 32/64-bit |
* - The Pentium 4 Northwood has 8kb of L1 data cache and 12kuops L1 trace execution cache. The Prescott has 16kb of L1 data and the same 12kuops.
The question that's been on our lips is whether the Prescott's technological advantages can make up for the performance shortfall that a much longer pipeline will inevitably create. We're about to find out.