Thoughts
While it would take a case-by-case analysis to exhaustively prove the Xeon's current shortcomings, even a cursory educated guess will get you close. It's the basics of the platform that trip it up, with the sharing of limited resources among too many consumers that have it languishing some distance behind current SMP dual-core Opteron. Having a single DDR2-400 memory controller, and all the latency-increasing, bandwidth-decreasing things that says to you, versus a good desktop DDR2 solution for dual-core Intel, is one of the larger drawbacks. The requirement for registered memory is correct for the audience and places Paxville DP will operate, and Paxville MP even more so, but the fact is that Opteron's on-CPU memory controller more effectively hides the extra latency from registered memory.Less than 4GiB/sec to share between the better-performing four cores is pretty measly, with each Opteron core enjoying roughly twice that with interleaved accesses to a shared MC on 2-way dual-core Opteron. Downstream from the MCH, if you want to do any I/O, there's not much of a link to the ICH5R, itself really old, and only the PCI-X segment bridge gets any real bandwidth deserving of a high-end workstation system.
Turning on HT on the Paxville system just made things even worse. HyperThreading creates two processors for the OS to schedule threads on, per core, but remember that core is singular. There's still the same dual-ALU, dual-issue, long-stage pipeline to share. Cache misses, pipeline stalls and other facets that stall performance on a single-threaded core, hurt up to twice as much on a HyperThreaded system. Make all those CPUs fight for access to lacking resources such as the memory controller and I/O ASICs and you have a recipe for something that's not going to go very fast in the majority of situations.
Opteron simply does things much better, with an infrastructure and topology better suited to memory access and I/O, with HyperTransport links aplenty in modern multi-way Opteron systems for pretty much anything you can think of. It's basic CPU architecture, with a 14-stage, dual-issue (ADD and MUL per cycle) main integer pipeline, that scales to within 400MHz of the 31-stage Netburst core in Paxville, means it'll really just crap all over that core in terms of basic CPU-bound performance.
The only real situation where the 8-way Paxville DP setup will have nice performance win versus anything else in two sockets, is if all eight threads come from the same HT-aware app that's doing a nice, easy blend of integer math, that sits completely inside the 2MiB of L2 per core. Ideally, with Paxville's L1 cache being so low latency, you want it all running inside of the 16KiB L1 cache space, but you'd be lucky. You don't want to do I/O, and you don't want to touch the MCH, really. It'd be just as well.
Ah well! Intel have been bumbling around with Xeon for ages, and this latest evolution brings nothing but larger L2, dual-core and trouble with HT on, for the most part. Outside of something like a webserver serving static content, or a contrived scientific application, I'm struggling to see the appeal in any way.
Paxville DP at 2.8GHz is $1030 or so in quantity at the time of writing, making them $300 or so cheaper than the Opteron 280s that trounced them in this article. For the beating Opteron dishes out, the price difference of ~30% is just about right. That is until you consider the 200MHz slower Opteron 275 at $1050, which would knock the Paxville out in much the same manner, meaning this judge is scoring it a no-contest. It makes the discussion about the Opteron 280 also being cooler and quieter, and therefore easier to live with as a workstation, moot.
We wait for Woodcrest, then.