We detailed NVIDIA's GK110 Kepler architecture last week and, only yesterday, revealed that the firm's mystery Tesla K20X was responsible for placing Cray's Titan supercomputer at the top of the supercomputing charts.
Today, NVIDIA has at last officially announced its two new GK110 Tesla cards, the Tesla K20X and Tesla K20 and, it's now possible to purchase systems based on these units from OEMs and resellers.
Features | Tesla K20X | Tesla K20 | Tesla K10 | Tesla M2090 | Tesla M2075 |
Number and Type of GPU | 1 Kepler GK110 | 2 Kepler GK104s | 1 Fermi GPU | 1 Fermi GPU | |
GPU Computing Applications | Seismic processing, CFD, CAE, Financial computing, Computational chemistry and Physics, Data analytics, Satellite imaging, Weather modeling | Seismic processing, signal and image processing, video analytics | Seismic processing, CFD, CAE, Financial computing, Computational chemistry and Physics, Data analytics, Satellite imaging, Weather modeling | ||
Peak double precision floating point performance | 1.31 Tflops | 1.17 Tflops | 190 Gigaflops (95 Gflops per GPU) |
665 Gigaflops | 515 Gigaflops |
Peak single precision floating point performance | 3.95 Tflops | 3.52 Tflops | 4577 Gigaflops (2288 Gflops per GPU) |
1331 Gigaflops | 1030 Gigaflops |
Memory bandwidth (ECC off) | 250 GB/sec | 208 GB/sec | 320 GB/sec (160 GB/sec per GPU) |
177 GB/sec | 150 GB/sec |
Memory size (GDDR5) | 6 GB | 5 GB | 8GB (4 GB per GPU) |
6 GigaBytes | 6 GigaBytes |
CUDA cores | 2688 | 2496 | 3072 (1536 per GPU) |
512 | 448 |
Even more of a beast than the Tesla K20, the K20X features an amazing 2,688 CUDA cores on a single die (we assume 16 SMX units), 6GB of RAM and an increased memory throughput. At 1.31 Teraflops of double-precision floating-point performance, the Kepler-based Tesla is twice as powerful in raw figures as its Fermi counterpart and even more so in reality, all whilst utilising less power.
Unlike the K10, the K20 range features a complete internal and external ECC pipeline, along with support for Dynamic Parallelism and Hyper-Q functionality, though, use of ECC does come at a cost of 12.5 per cent of memory capacity and, a little performance.
It's exciting times for GPGPU compute and, with the increasing presence of graphics cores in mainstream processors, we wonder at what point writing GPU code will become a common part of every programmer's daily routine.