RysDB MySQL Transaction Benchmark
Dual 1.8Ghz Opteron running 64-bit code versus dual 3.06GHz Xeon running 32-bit code. Will a 1.2GHz frequency deficit be clawed back by the memory controller and cache performance of the new CPU? Let's see the graph before we discuss any further.I wasn't expecting the graph to initially look like that before I created it, but further analysis of the numbers and the benchmark itself bore it out. The Opteron system has a rough 10% performance advantage over the Xeon's doing the raw MySQL transaction replay. It turns out, testing on a system that is receptive to CPU overclocking, that this benchmark is mostly CPU limited.
Once the transaction log is loaded in to memory from disk, a low latency memory controller is all that is needed to feed the processor so that it can churn through the data. With Opteron's memory controller operating with the same overall latency as a full speed controller on the latest Intel chipsets, the 1MB L2 cache takes over from there. Speculative hardware prefetch and the extra execution resources on the Opteron do the rest, the 64-bit version of MySQL looking good as far as optimised software goes.
It's doubtful that this early in the life of x86-46 that 64-bit MySQL has gone through anything other than a run through x86-64 aware gcc and some limited hand tuning, which makes the performance that bit more impressive.
It's interesting to see some slight performance scaling at different levels of the benchmark, on both processors. It's hard to explain. My personal theory is that branch predictor 'learns' where the code will be going after the early runs through the data and can optimise accordingly. The graph doesn't quite scale the same way on 32-bit Athlon XP, giving that theory more credence, since the branch predictor on Opteron is improved over the older CPU.
Overall, a nice performance metric to validate the hard work AMD have put in to Opteron and the entire ISA. It's also nice to see the Intel platform doing well too, compared to single CPU numbers, the extra Xeon adds nearly 20% performance to a uniprocessor configuration.