vacancies advertise contact news tip The Vault
EPIC HEXUS COMPETITIONS OVER £8,000 worth of gear to be won! [x]
facebook rss twitter

CUDA 5 - Kepler at its best

by Alistair Lowe on 24 October 2012, 11:17

Tags: NVIDIA (NASDAQ:NVDA)

Quick Link: HEXUS.net/qabn7j

Add to My Vault: x

Earlier this week, we highlighted the performance gains possible on AMD's Radeon HD 7xxx series of graphics cards, with the release of entirely GCN-focused drivers. It's perhaps only fair then, that we take a quick look at NVIDIA and its Kepler-optimised CUDA 5 general-purpose compute library.

Often, when we hear of CUDA and and GPGPU compute, we think, this has nothing to do with games, why should we care? The truth is, CUDA and compute on the GPU in general is a fast-growing market. There's clear potential for accelerating scientific simulations, media encode/decode but also, games, where sometimes a custom algorithm is needed to provide a new visual effect or simulation of weather systems, which require massive parallelism. It's highly expected that all next-gen AAA game engines will utilise this form of acceleration in one way or another.

There are now over 375 million CUDA-capable GPUs in the world and, with each new architecture, NVIDIA introduces optimisations. CUDA 5 looks to bring-out the best in Kepler, offering:

 

  • Dynamic Parallelism
  • GPU Object-linking
  • All-in-one Eclipse Nsight Develop, Debug and optimise plug-in
  • GPUDirect
Most applicable, perhaps, to the gaming market are the top two new features. Dynamic Parallelism enables the GPU to generate more or less tasks in parallel as resources are available and enables GPU code to decide for itself the parameters of newly launched tasks, saving the often lengthy process of returning results to the CPU and waiting for a response, or simply launching a fixed task, that may not be suitable, based on previous results.

Dynamic Parallelism
From a gaming perspective, this could help establish smoother frame-rates and maximise GPU usage. GPU Object-linking is a big one for development houses as well. Developers can now segment GPU programming tasks and have code developed in parallel by multiple coders, to be stitched together simply by providing the finished object file.

Though there's not currently a huge usage potential for GPUDirect in gaming, this is by far CUDA 5's most impressive feature. GPUDirect enables GPUs to perform direct memory transfers, not only to other GPUs sharing the same PCI-E bus but, also to other devices. Place a DMA-capable network card on the bus and suddenly you have direct memory transfers from one GPU to any other PCI-E device on a network, without any CPU intervention.

GPUDirect
These new features excite us and, we expect to see some serious practical usage of GPGPU compute in gaming next year with the release of new high-end consoles. By no means is the GPU market showing signs of slowing.

 



HEXUS Forums :: 18 Comments

Login with Forum Account

Don't have an account? Register today!
Poor Hexus. We already knew all this back when Kepler was released. For eg, this article back in may:

http://www.theregister.co.uk/2012/05/15/nvidia_kepler_tesla_gpu_revealed/page2.html [theregister.co.uk]

theregister



hexus
These new features excite us and, we expect to see some serious practical usage in gaming next year with the release of new high-end consoles.
I didn't think the next set of consoles had nVidia chips in, does Hexus know something we don't?

If you're serious about PC gaming then these features shouldn't excite you, only give cause to worry that the already small market is going to be fragmented even further. We need work on cross-vendor features like openCL instead.

And highlighting old features just after an AMD driver release (I don't recall Hexus doing the reverse after nVidia driver release news) smacks of influence in Hexus' decision making. Come on, you're better than that.
I don't agree with your thinking on this one, Kalniel.

CUDA 5 was officially launched last week so it makes sense to cover it some way.

Also, NVIDIA rolled out new beta GeForce drivers yesterday that offer up to 15 per cent extra performance, though the majority of gains are sub-five per cent.

http://www.geforce.com/whats-new/articles/nvidia-geforce-310-33-beta-drivers-released/ [geforce.com]

We took a good look at the improvements and decided that, on balance, the gains weren't significant enough for a full-on analysis, per Catalyst 12.11.

Take a look at the amount of AMD vs. NVIDIA coverage in the last few weeks, too.
Thanks for the reply Tarinder. I was surprised there was no coverage of the nVidia drivers - that would have been more consistent given past coverage and appropriate IMHO.

On the other hand you make the point in the article that it's only fair to cover CUDA 5 given the AMD driver news. It doesn't make sense to me to only cover it now and in some way as a response the the driver news.

Nor does it make sense to me to claim that you'll see this in new gaming consoles or imply that "It's highly expected that all next-gen AAA game engines will utilise this form of acceleration in one way or another.". it's only if you carefully look at the exact wording that you see you might talking about general GPU compute rather than CUDA, when the whole tone of the article is closely tied into nVidia's kepler.
There's clear potential for accelerating scientific simulations, media encode/decode but also, games, where sometimes a custom algorithm is needed to provide a new visual effect or simulation of weather systems, which require massive parallelism.
I bought an example of this a while ago. There was a special offer on Just Cause 2 on the PC (very, very cheap!) so given I loved that game on the XBox, I bought it for my PC. I was pretty unhappy then when I was unable to get anything like a decent frame rate without resorting to sub-console graphics levels.

However, when I tried the option to "run the water simulation on the GPU instead of CPU" (not the exact description, but close enough) the difference was staggering. Not only were the graphics now very smooth, but I was also able to ramp up the resolution to 1080p and hit the high AA and AF settings.

I know that AMD Phenom II's aren't exactly powerhouses these days, but I didn't expect that moving one game aspect from CPU to my (now elderly) GF460 would make such a marked difference.

My point being that although CUDA and OpenCL don't seem relevant to gamers at the moment (as the article says) they might well be increasingly so as GPU's get more and more powerful. Personally though I think it's a shame that NVidia decided to do their own thing rather than get behind OpenCL - fragmentation = bad!
crossy
I bought an example of this a while ago. There was a special offer on Just Cause 2 on the PC (very, very cheap!) so given I loved that game on the XBox, I bought it for my PC. I was pretty unhappy then when I was unable to get anything like a decent frame rate without resorting to sub-console graphics levels.

However, when I tried the option to "run the water simulation on the GPU instead of CPU" (not the exact description, but close enough) the difference was staggering. Not only were the graphics now very smooth, but I was also able to ramp up the resolution to 1080p and hit the high AA and AF settings.

I know that AMD Phenom II's aren't exactly powerhouses these days, but I didn't expect that moving one game aspect from CPU to my (now elderly) GF460 would make such a marked difference.

My point being that although CUDA and OpenCL don't seem relevant to gamers at the moment (as the article says) they might well be increasingly so as GPU's get more and more powerful. Personally though I think it's a shame that NVidia decided to do their own thing rather than get behind OpenCL - fragmentation = bad!

Potentially it looks good,but it does worry me at times, on whether AMD or Nvidia do make sure things look better with running stuff on the GPU as opposed to the CPU. Look at PhysX for example - Nvidia made sure it used inefficient x87 paths and was single threaded in the past when run on a Windows PC,even though there are paths which are more efficient. However,on consoles a much more efficient path is used.