Bulldozer benchmarks correct; but not definitive

After our own reader-base flagged the matter in comments last week, we began to investigate a rumour that there was a registry patch in the works that could offer a 40 per cent performance increase for AMD's Bulldozer CPUs.

Though, quite reasonably, the figure of 40 per cent appears to be somewhat exaggerated we have found both merit and proof that the Bulldozer does indeed suffer a performance penalty under scenarios with only a few intensive threads, that is an issue grounded entirely in software.

This all relates to the Bulldozer's unique shared architecture where two cores share some of their resources in what's known as modules. If we were to talk about a hyper-threading Core i7, typically an operating system's scheduler would assign intensive threads to separate real cores, before assigning less intensive threads to the virtual hyper-thread cores. In the case of the Bulldozer, schedulers see each core as a completely independent entity and do not recognise the shared relationship of cores and their modules.

This leads to various scenarios:

· Two intensive threads are placed on the same module, fighting for resources when there are spare modules or modules running only non-intensive threads.

· Two tightly interconnected threads that must share information are placed on separate modules, leading to a large communication delay between the two threads.

· Non-intensive threads are spread across multiple modules, preventing the Bulldozer from powering-down unused modules, affecting both energy-efficiency and the ability to employ Turbo Boost.

Scenario 2 and 3 un-optimised

Scenario 2 and 3 optimised

This all goes some way to explaining the poor performance of the Bulldozer in benchmarks that were not thread-heavy. We had never expected to see exceptional low-thread performance from an octo-core devise; but had found the Bulldozer's advanced 3.6GHz cores occasionally bested by the 3.3GHz cores of its hex-core Phenom II 1100T brother.

AMD confirms that this issue is real in a whitepaper, stating that both modifications to compilers and schedulers would be needed to maximise performance of the Bulldozer architecture. Full support for AVX and other new vector instructions has only been available since Visual Studio 2010 Service Pack 1 and compiler awareness for a shared-module architecture is still not present.

Likewise, the Windows 7 scheduler lacks shared-module architecture awareness, leading to the scenarios listed above. In simple tests performed by techreport.com, it was discovered that by setting a scheduler affinity, causing all threads to be placed on separate modules, a 10 to 20 per cent increase in performance could be observed over all but one of the test cases. This test only tackles the first scenario listed above and a fully-aware scheduler would be required to deal with all of the possible situations.

Windows 8 preview sports an enhanced work-in-progress scheduler that does have this awareness. In AMD benchmarks there was an average five per cent performance increase in real-world gaming scenarios. We stress that these figures are subject to change as Windows 8 is in early development and that enhancements to the scheduler are still being made.

AMD Bulldozer Windows 8 Benchmarks

Results certainly vary, with up to twenty percent performance increase observed in synthetic benchmarks with a forced core affinity and up to ten per cent observed in real-world gaming scenarios under an enhanced module-aware scheduler. We suspect this variation is caused by attempting to strike a balance across all of the above scenarios and that perhaps the perfect solution to the Bulldozer's unique architecture has yet to be found.

What we can say with certainty is that if you're looking for performance in the here and now, then the HEXUS benchmarks released earlier this month won't steer you wrong. If you're looking at the future capability of the Bulldozer CPU as a long-term investment, then early results indicate that you'll receive anywhere from two to twenty percent better performance in low-thread scenarios and that's without factoring in compiler enhancement.

We'll be sure to keep a close eye on any new developments.

HEXUS Forums :: 21 Comments

Login with Forum Account

Don't have an account? Register today!

Posted by deleted - Mon 31 Oct 2011 11:13

It is good to finally have evidence in support of this theory. It bodes well for AMD's future because without software derived performance improvements I wasn't sure they would be able to keep up with Intel.

Posted by deleted - Mon 31 Oct 2011 11:32

HW guys: “We've got this brilliant new architecture that takes a different approach to superscalar resource usage.”

System guys: “But won't that affect performance for software and OSes that aren't optimised to the new micro-architecture?”

HW guys: “We'll leave that one to the compiler writers.”

System guys: “Good call.”

Poor compiler writers :(

It's always been the case, of course. Unfortunately, in this instance, it's lead to some performance figures that haven't been very marketing-friendly.

Posted by deleted - Mon 31 Oct 2011 11:57

But its one of the big advantages of the RISC system, that simpler design, leads for simpler compilers, simpler debugging.

If AMD want to design a new chip, which has an un-convential design, and as such different from everything before in terms of optomisations, THEY should be leading the way….

Posted by deleted - Mon 31 Oct 2011 11:58

Deleted
HW guys: “We've got this brilliant new architecture that takes a different approach to superscalar resource usage.”

System guys: “But won't that affect performance for software and OSes that aren't optimised to the new micro-architecture?”

HW guys: “We'll leave that one to the compiler writers.”

System guys: “Good call.”

Poor compiler writers :(

It's always been the case, of course. Unfortunately, in this instance, it's lead to some performance figures that haven't been very marketing-friendly.

Reminds me of the same troubles Sony had with the CELL CPU and developers doing nothing but moan for the first year.

Posted by deleted - Mon 31 Oct 2011 12:07

It's a chicken and egg scenario though isn't it?

And you could argue AMD lead the way….by releasing the chip ;)

And yes, it reminds me very much of the PS2 and PS3 releases……although this one should be considerably easier to overcome!

SEE NEWER »

Bulldozer benchmarks correct; but not definitive

Related Reading

HEXUS Forums :: 21 Comments

Login with Forum Account

MY HEXUS

EVENTS

INDUSTRY PRESS RELEASES

User Name
Password