facebook rss twitter

Review: AMD Athlon XP 3200+

by Ryszard Sommefeldt on 13 May 2003, 00:00

Tags: AMD (NYSE:AMD), NVIDIA (NASDAQ:NVDA)

Quick Link: HEXUS.net/qarg

Add to My Vault: x

Chip specifics




The processor itself is a 54.3 million transistor behemoth with a 101mm² die size compared to 84mm² on a Thoroughbred making it comparable to the latest Pentium 4 processor die which weighs in with 55 million transistors and a 145mm² surface area.

Of course, the extra transistor count is down to the doubling of level 2 cache size from 256KB to 512KB. With level 1 cache static at 128KB (split evenly between instruction and data caches) you have a total 640KB of embedded memory on the processor die, 576KB of that available for fast access to data.

That compares to the latest Pentium 4 consumer processors that also have 512KB of level 2 data cache, but only 8KB of dedicated level 1 data cache and a 12K micro-op trace cache to let the Pentium 4 keep the front end of its long 20-stage pipeline busy. Given that the 8KB L1 data cache is mirrored in L2, you only have 512KB total data cache on a Pentium 4, with only 8KB of it accessible very quickly.

I mention the 20-stage (excluding an undisclosed set of decode stages) pipeline on the Pentium 4, this compares to the shorter 10 or 15 stage pipeline (depending on whether the integer or floating point unit is used) on the Athlon XP. A longer pipeline means higher attainable clock frequencies but less work done per clock cycle attainable. Conversely a shorter pipeline means less clock frequency headroom but more work done per clock cycle.

In easier to digest English, the P4 clocks higher but does less per clock compared to the Athlon XP which clocks less impressively but does significantly more work per clock cycle (especially when using its integer math units) than the P4.

So if you are clued up on your processor make-up, which you hopefully are if you are reading this, you'll know that clock frequency isn't everything and there's more going on behind the scenes than meets the eye.

I could write a whole set of articles on the relative setups of each x86 design, there's a lot to discuss and compare, but we'll stick with the above 'basics' for the purposes of this article.

There's one more basic feature of a modern processor to discuss before we move on to actual performance and that's something affected by the platform changes brought to the table by these new processors, processor I/O bandwidth. Processor I/O bandwidth is something measurable by the width of the data bus the processor uses to communicate with the rest of the system (usually the northbridge), multiplied by the number of times data is sampled on that bus per second (or the effective front side bus frequency).

For a data bus width of 64 bits (or 8 bytes, remember it's 8 bits to a byte) and an effective front side bus frequency of 400MHz (200Mhz, DDR) provided by the test nForce2 Ultra 400 chipset and required by the new processors, that gives us 8 (bus width in bytes) x 400,000,000 (bus samples per second). A quick press of Windows calculator buttons tells us that's 3,200,000,000 bytes per second or around 3.2GB/sec in real money.

The Pentium 4 does the QDR dance and with the newest 'C' processors that gives us 8 byte bus width (64 bit) and 800MHz of bus sampling power for double the effective processor I/O bandwidth of Athlon XP at around 6.4GB/sec. That I/O bandwidth is used to communicate with the rest of the system, be it memory controller, AGP controller etc

So it all seems to balance itself out doesn't it? Lower clock speed and CPU I/O bandwidth than P4, but bigger IPC and a better data cache layout to make up for it.

But does it balance out in favour of the newer Pentium 4 processors and its related Canterwood platform or does the latest addition to the Athlon XP family have enough oomph when paired with 200MHz nForce2 to push ahead? That's what we'll test.