At the Supercomputing Conference 2010 this week, NVIDIA chief scientist Bill Dally went into a little bit of detail (courtesy of EE Times) on the research the company was doing towards the next generation of high-performance computing and how it might bring performance to the exascale level.
The most powerful supercomputers in the world currently perform on the scale of a few petaFLOPS, or thousand teraFLOPS, which works out to several quadrillion calculations per second. An exascale coputer would need to be around 1,000 times faster, capable of at least a quintillion calculations per second. For reference, the ATI Radeon HD 4870 was touted as the first "terascale" GPU, capable of performing at the one teraFLOP level.
More cores, more cache, less power
The most important factor, according to Dally, will be efficiency. He foresees a single graphics-core that can complete a floating-point operation using only 10 picojoules of power, compared to 200 picojoules for the Fermi architecture. A chip - which has been codenamed Echelon - would then be built from 128 streaming multiprocessors (SMs), each of which would contain eight such cores, creating a 1,024-core GPU.
Cache is another important factor, and Daly suggested that Echelon could have up to 256MB. However, it will be possible to dynamically apportion the memory into as many as six levels depending on the needs and of the application.
Extreme performance
All of these architecture considerations will add up to a lot of performance. Because each core would be able to deal with four operations per second - as opposed to one in current cores - an Echelon chip with twice as many cores as one of today's GPUs would be capable of hitting performance of around ten teraFLOPS.
Obviously these sorts of developments will eventually make their way into products across NVIDIA's range. Daly even suggested that a single eight-core SM could one day be the basis of the company's next-generation mobile chips. Unfortunately, the manufacturing processes and facilities now have to catch up with the designs and simulations that the scientists are currently working on.
The point of the research is the DARPA-funded Ubiquitous High Performance Computing project, which NVIDIA is competing in along with Intel, MIT and Sandia National Labs. The goal of the project is to develop a petascale computer in a 57 kilowatt rack by 2014 which could then be used as the basis of an exascale supercomputer by 2018.