vacancies advertise contact news tip The Vault
facebook rss twitter

Nvidia CUDA 6 offers a unified memory programming system

by Mark Tyson on 15 November 2013, 13:15

Tags: NVIDIA (NASDAQ:NVDA), PC

Quick Link: HEXUS.net/qab5a5

Add to My Vault: x

Nvidia has announced a new version of its CUDA parallel computing platform and programming model, ahead of the annual International Conference for High Performance Computing, Networking, Storage, and Analysis in Denver next week (also known, in short, as SC13). CUDA 6 is said to “dramatically simplify parallel programming” by offering programmers unified memory access and also to accelerate applications using drop-in libraries and multi-GPU scaling.

With the latest implementation of unified memory support in CUDA 6 Nvidia has simplified programming “by enabling applications to access CPU and GPU memory without the need to manually copy data from one to the other, and makes it easier to add support for GPU acceleration in a wide range of programming languages”. To be clear this is just a simplification for the programmer; memory copies between the system and GPU memory still need to be done and the latencies remain, however CUDA 6 handles the programming for you transparently.

This simplified implementation should both speed up coding for existing developers and attract new developers. “By automatically handling data management, Unified Memory enables us to quickly prototype kernels running on the GPU and reduces code complexity, cutting development time by up to 50 percent,” attested Rob Hoekstra, manager of Scalable Algorithms Department at Sandia National Laboratories.

Drop-in libraries and multi-GPU scaling are also implemented in CUDA 6. We are told that the drop-in libraries will automatically accelerate “BLAS and FFTW calculations by up to 8X by simply replacing the existing CPU libraries with the GPU-accelerated equivalents”. Multi-GPU scaling is also supported in the new BLAS and FFT GPU libraries. These libraries “automatically scale performance across up to eight GPUs in a single node, delivering over nine teraflops of double precision performance per node, and supporting larger workloads than ever before (up to 512GB)”.

The new CUDA 6 platform will be further detailed next week at SC13 in Denver. The updated CUDA toolkit is expected to be made available in early 2014.



HEXUS Forums :: 7 Comments

Login with Forum Account

Don't have an account? Register today!
Is this Nvidia catching up with AMD?
Luke7
Is this Nvidia catching up with AMD?

Entirely separate. This is CUDA being further improved, which is nice given that current CUDA is already leaps and bounds better than OpenCL.
Nvidia Cuda , AMD Mantle , Nvidia Cuda 6

Really ?!
tribaljet
Luke7
Is this Nvidia catching up with AMD?

Entirely separate. This is CUDA being further improved, which is nice given that current CUDA is already leaps and bounds better than OpenCL.

But OpenCL is cross-platform where as Cuda is just Nvidia? So realistically it SHOULD be better than OpenCL because there is less compatibility/optimizing work to be done. IMO anyway.. I'm not really up to date with these things.

So how is this separate from AMD's Unified Memory Architecture? I understand different companies but the principle is exactly the same is it not?
tribaljet
Luke7
Is this Nvidia catching up with AMD?

Entirely separate. This is CUDA being further improved, which is nice given that current CUDA is already leaps and bounds better than OpenCL.

Existing cpu libraries are “not so easy” to turn into “gpu intensive” libraries. examples include codec and vray. Now dont start follow up discussion. I have enough hands on experience to to tell that vray works slower in cuda enable nvedia card than multicore processor. There were always option to turn cuda in vray, but it never accelerated the workload. Same thing goes to codec. In encoding and decoding gpu sucks and so does cuda. I need this 2 things to improve and i dont care if it is nvedia or amd. Until then both of their gibberish sucks.