Improvements Throughout
The GeForce GTX 1080 is the GP104 die in its fullest form. A high-level overview shows that it is logically similar to various Maxwell GPUs... and this is no coincidence. Nvidia's architecture choices around this graphics processor centre on increasing overall performance by a combination of sky-high frequencies and some nifty technology for virtual reality.
It's appropriate to think of the GP104 die as a scaled-up version of the GM204 powering the GTX 980. There are now 20 SM units arranged over four GPC clusters instead of 16, but each one continues to carry 128 CUDA cores (split into four groups that process 32-thread warps), 256KB of register file capacity, 96KB of shared cache, 48KB of explicit L1 cache, and eight texture units. There remains the 2,048MB of L2 cache and 64 back-end ROPs, as well, so you may wonder what all the Pascal fuss is about.
Given that GTX 1080 can be thought of as a GTX 980 with 25 per cent 'extra' in terms of general architecture ammunition, plus the same 256-bit pathway, albeit clocked in much higher, there has to be some secret sauce for Nvidia to claim prodigious performance that comfortably topples the much bigger GTX 980 Ti.
These Go To Eleven
Part of Pascal's secret recipe is realised by the fastest shipping frequencies ever seen on a consumer graphics card. In a world where 1,000MHz has previously been considered a solid core frequency and 1,200MHz the upper echelons for acceptable yields, GTX 1080 cranks up the 2,560 cores to 1,607MHz, boosting to at least 1,733MHz. Such rampaging clocks fit in nicely to a 180W TDP. The question that we need to answer is how?
We've spoken about the positive attributes of moving down manufacturing processes, and one such goodness is the ability to drive higher frequencies. One would expect the 16nm geometry to offer a reasonable bump in frequency, though not as high as Nvidia has achieved.
Jonah Alben, who oversaw Pascal, said to us that a huge amount of work had been done to minimise the number of 'violating paths' that stand in the way of additional frequency. This is critical-path analysis by another name, where engineers pore over the architecture to find and eliminate the worst timing paths that actually limit chip speed. If successful, as appears to be the case here, the frequency potential is shifted to the right, or higher. Alben reckoned that Nvidia managed a good 'few hundred megahertz' by going down this path, if you excuse the pun, so Pascal is Maxwell refined to within an inch of its life.
2,560 cores operating at 1.7GHz offer nearly 9TFLOPS of performance, though in keeping with the gamer-centric positioning of the GTX 1080, double-precision support isn't even mentioned by Nvidia, so we'll have to get confirmation on its capabilities. Just like the memory, Nvidia has gone narrow and very fast, thus achieving the same sort of overall speeds previously only available on slower 'big-chip' GPUs.
Generational Evolution
This talk of speeds and refinements is a good opportunity for pause, so we'll roll out the usual table to examine the GTX 1080's vital statistics and compare them more fully with other recent GTX GPUs.
GeForce GTX 1080 |
GeForce GTX 980 Ti |
GeForce GTX 980 |
GeForce GTX 780 Ti |
|
---|---|---|---|---|
Launch date | May 2016 |
June 2015 |
September 2014 |
November 2013 |
Codename | GP104 |
GM200 |
GM204 |
GK110 |
Architecture | Pascal |
Maxwell |
Maxwell |
Kepler |
Process (nm) | 16 |
28 |
28 |
28 |
Transistors (bn) | 7.2 |
8.0 |
5.2 |
7.1 |
Die Size (mm²) | 314 |
601 |
398 |
561 |
Core Clock (MHz) | 1,607 |
1,000 |
1,126 |
876 |
Boost Clock (MHz) | 1,733 |
1,076 |
1,216 |
928 |
Shaders | 2,560 |
2,816 |
2,048 |
2,880 |
GFLOPS | 8,873 |
6,060 |
4,981 |
5,345 |
Memory Size | 8GB |
6GB |
4GB |
3GB |
Memory Bus | 256-bit |
384-bit |
256-bit |
384-bit |
Memory Type | GDDR5X |
GDDR5 |
GDDR5 |
GDDR5 |
Memory Clock | 10Gbps |
7Gbps |
7Gbps |
7Gbps |
Memory Bandwidth | 320 |
336 |
224 |
336 |
Power Connector | 8-pin |
8-pin + 6-pin |
6-pin + 6-pin |
8-pin + 6-pin |
TDP (watts) | 180 |
250 |
165 |
250 |
Launch MSRP | $699* FE |
$649 |
$549 |
$699 |
Narrow and fast gives the GTX 1080 a huge on-paper shading advantage over the GTX 980/Ti and, appreciating the extra compression capabilities of the Pascal architecture, more-than-competitive memory bandwidth. Ironically, we asked Alben if these kinds of numbers were true representations of potential performance, to which he replied: 'I wouldn't be very good at my job if you could accurately determine all-round performance from FLOPS and GB/s alone'.
A Different Viewpoint
One such example of Alben's statement rests with how Pascal copes with the demands of virtual reality. You know how it appears that nothing of note has changed in the high-level block diagram? Well, that's not quite true, as Nvidia has added some extra hardware logic to help ease the load imposed by VR and angle-correct multi-monitor gaming, and it goes by the name of simultaneous multi-projection (SMP).
Architecturally speaking, the unit sits after the initial shading and geometry stages within the graphics pipeline. The reason why you'd want to use it has to do with rendering multiple viewpoints without having to re-run geometry multiple times, saving significant processing.
There's no obvious benefit when looking at a scene head-on, as you would on a normal monitor, as a single viewpoint is enough, but consider the different viewpoints when looking at a multi-monitor surround setup - each one requires a slightly different viewing angle or projection, so, ideally, you'd want to run the geometry and front-end setup three times to have it look perfect. Graphics cards don't currently do this because it is too expensive to run multiple passes that each have to be processed through the pipeline, and the current 'cheat', if you will, is to run one viewpoint and expand it out: this is why multi-monitor gaming looks stretched and imperfect.
The fact that the SMP engine runs after all the data has been gathered from the three main shaders and reuses it means enabling multiple viewpoints is much less of a processing burden than it would otherwise be. Pascal can run 16 viewpoints in one fell swoop, with what is known as two projection centres, so it can replicate the geometry up to 32 times. Of course, no pre-Pascal GPU would ever consider doing this.
And it is the dual projection centres that help in VR. The initial setup is done once and the SMP engine then creates the correct viewpoints for each eye, without having to re-run the front-end, though there's still obvious extra work for pixel shading. Nvidia calls this technology single-pass stereo, and adding multiple viewpoints - remember, Pascal can support up to 16, though only four will be first supported in an upcoming driver - on top helps in turning a regular 2D image into what a VR headset actually displays. Essentially a number of different projections, which are more or less free to create on the SMP, are mashed together to form an overall output that is as close to the lens shape as possible.
What's being shown in the above picture is somewhat anachronistic. The SMP's two projection centres reduce the geometry from 2x to 1x in the first instance, then multiple viewpoints are combined and meshed to match the shape of the VR lens, in this case an Oculus Rift, to reduce the amount of pixel processing needed. Point is, SMP is a clever bit of technology.