picCOLOR Image Analysis
picCOLOR is a benchmark I first came across on The Tech Report, where they use it for system analysis. After contacting its author with regards to 64-bit builds that execute on AMD64 on Windows operating systems, along with a request for a new 32-bit build, it became apparent that The Tech Report use it for good reason.To perform its image analysis, picCOLOR uses tight loops and often hand-coded assembly language to glean its performance, making it a nice CPU benchmark. It also calls GPU driver functions while outputting images to the display, so you can see the effect that different driver versions have on the same hardware.
For this article however, the driver version is constant (61.76). I'll also comment on the driver while discussing the limited 64-bit performance testing.
32-bit Opteron vs 32-bit Xeon
Bar the array index test, the Opteron is either equal with the Xeon or some way ahead, sometimes over twice as fast (the fast Fourier transform test).
There are a few tests that are interesting to evaluate in more detail. The AddressMem test uses one CPU to draw the top of the screen and the other to draw the bottom. L2 cache performance therefore comes into play. If picCOLOR is swapped off of the CPUs in the middle of that test, the caches are invalid when it's swapped back on, causing a cache miss (on both processors) and a trip to main memory, slowing it down.
The Skeleton array indexing test is also cache bound with the test favouring the Prescott cache usage in this case, over the Opteron.
With regards to HyperThreading, the Prescott's tweaked HT implementation allows good performance on Xeon during test 9 (FP conversion) where the code uses the CPU's ATAN function heavily, which enjoys a speed boost on Prescott when HyperThreading is being worked hard.
Conversely, compared to Northwood or older Xeon's, the functions that rely heavily on picCOLOR's hand-optimised MMX functions are slower on Prescott.
Finally, on a test that's memory latency bound, Opteron is happy to run off into the distance. See the Watershed test that would be slower on Opteron, if it weren't for the on-CPU memory controller. Xeon's FPU can only do so much until it's left waiting for the memory contoller to respond.
Overall score
Overall, as seen in the above graph, the Opteron is the best CPU for applications like picCOLOR. If your application uses plenty of fairly big copy buffers, lots of tight SIMD processing loops and supports multi-threading (all of those I mention in a general sense), it seems that Opteron is the CPU for you.
64-bit performance
I was able to run some 64-bit picCOLOR tests on Nocona, using NVIDIA's 61.76 driver for Windows Server 2003 for 64-bit Extended Systems. Sadly, I was unable to run the same 64-bit build on the Opteron and work through the results with Dr. Mueller, before the Opteron system had to be returned.Xeon vs Xeon it is, 32-bit vs 64-bit EM64T.
In the functions that don't use MMX loops, EM64T is a decent speed boost, especially if the function performs interpolation using floating point numbers. It's twice as fast in that respect. The C compiler used for picCOLOR also does a good job of asking for more GPRs when needed, to speed things up that way. However, MMX doesn't execute on the current 64-bit build of Windows Server 2003 due to compiler issues, so functions that would have been done using MMX in a 64-bit build of picCOLOR are done using slower, alternate implementations of those functions in the build that was run.
In general, 64-bit is a very nice win for picCOLOR and its CPU usage model, barring some issues Dr. Mueller has with arctan functions on both Opteron and Nocona.
Opteron is a monster CPU for picCOLOR, and image analysis in general it seems.