A TITAN from Tesla
Editor's note: this is an article covering the architecture, specifications and appearance of the upcoming GeForce GTX TITAN graphics card. We are not permitted to disclose performance numbers until a later date. All images in this article are provided by NVIDIA PR.
AMD: we're better than NVIDIA at every price point
It may seem trite to start off a technical evaluation for an NVIDIA GeForce GPU by referencing a belligerent belief from AMD, but, in a call with executives, the red team reckons it has the best consumer graphics cards at every price point. It says the Radeon-infused ASUS ARES II is the 'world's fastest graphics card' and the Radeon HD 7970 GHz Edition is the 'best GPU.'
NVIDIA may well take umbrage at both of those statements, and while reining in the ARES II is perhaps outside the performance remit of even the best GTX 690s, the GeForce gang is today launching its best-ever GPU, the GeForce GTX TITAN. Faster and more elegant than anything that has hither-to been available for single-GPU consumer graphics cards, or so NVIDIA says, let's uncover just how titanic this new behemoth is.
State of play
But it's worth setting the scene first. AMD's release of the Radeon HD 7970 GHz Edition and a flurry of driver improvements, especially in recent months, has seen this Tahiti-based, GCN-architectured card open up a wholesome lead over NVIDIA's incumbent single-GPU champ, the GeForce GTX 680. We believe the gap, evaluated over modern games, is large enough to hinder NVIDIA's ambitions of (re)taking the performance crown with GTX 680 models that are clocked in at ever-higher speeds. What's needed is a new architecture - a next-gen Maxwell-powered GeForce GTX 780 - but the pace of GPU development has slowed enough that we're unlikely to see bona fide 7-series GeForces until 2014.
NVIDIA, therefore, wishes to beat up on the Radeon HD 7970 GHz Edition but doesn't appear to have the single-GPU muscle to do so... or does it? If you take a look at the professional graphics accelerators from the green team - Quadro and Tesla - which are ostensibly based on the same Kepler underpinnings, NVIDIA released the Tesla K20X back in November 2012. It's special insofar as it uses what is known as the GK110 die - a 7.1-billion-transistor chip designed for performance above all else. Moreover, the K20X uses the Kepler GPU structure in a very wide, parallel sense, where 2,688 cores, 6GB of memory, and a 384-bit memory bus promise top-notch potential performance. Handy for GPU Compute applications such as weather modelling and computational chemistry, this beast costs around $3,000 a pop.
Clearly superior to the GeForce GTX 680 in all performance facets, NVIDIA's GeForce TITAN - a gaming card, remember - grabs hold of this Tesla K20X architecture with both hands, removes a few workstation-related features unnecessary to the gaming crowd, boosts the frequencies, adds some new GPU Boost love, and repurposes all of this moxie into a limited-edition card.
A TITAN from Tesla
Spitting out a bunch of performance benefits doesn't usually paint the entire technical picture, so as normal, here's a mini-Table of Doom™.
|GPU||GeForce GTX TITAN (6,144MB)||GeForce GTX 680 (2,048MB)||GeForce GTX 690 (4,096MB)||Radeon HD 7970 GHz (3,072MB)|
|Transistors||7.1bn||3.54bn||3.54bn x 2||4.3bn|
|Approx Die Size||551mm²||294mm²||294mm² x 2||352mm²|
|Processors||2,688||1,536||1,536 x 2||2,048|
|Texture Units||224||128||128 x 2||128|
|ROP Units||48||32||32 x 2||32|
|GPU Clock/Boost (MHz)||837 (876)||1,006 (1,058)||915 (1,019)||1,000 (1,050)|
|Shader Clock/Boost (MHz)||836 (876)||1,006 (1,058)||915 (1,019)||1,000 (1,050)|
|Memory Clock (MHz)||6,008||6,008||6,008||6,000|
|Memory Bus (bits)||384||256||256 x 2||384|
|Max bandwidth (GB/s)||288.4||192.2||192.2 x 2||288|
|GFLOPS per watt||17.98||15.84||18.74||16.38|
|Current price||$999 (£825)||$449||$999||$449|
The TITAN chip packs in twice as many transistors and is almost twice as large as a premium GeForce GTX 680. TITAN's 551mm² die-size is huge by current standards but, putting it into context, around the same area as a 40nm-based GTX 580 'Fermi' GPU. Housing 75 per cent more cores, texture units and 50 per cent greater ROPs, really do think of it as a scaled-up GTX 680. All that front-end horsepower needs to be fed, meaning a 256-bit memory bus ain't gonna cut it. TITAN/K20X harks back to GTX 580 and also uses a wider 384-bit bus. And designed to run at super-high resolutions that are likely to be used in conjunction with high-quality image settings, TITAN is equipped with a 6GB frame-buffer, used exclusively - obviously - for the single GK110 GPU.
Though on-paper performance is scaled up by between 50-75 per cent, board power increases by less than 30 per cent over an already-frugal GTX 680. This impressive stat, helped by having lower-than-GTX 680 core clocks, enables the GK110 TITAN to be fit into a 250W thermal envelope. You'll see a little later on that NVIDIA has played on this comparatively modest TDP and built a very quiet card around it.
Take another look at the basic vital stats and TITAN has more in common with a Radeon HD 7970 GHz Edition than a GTX 680. GFLOPS, memory bandwidth and power are closer to AMD's best card. And for those wondering if TITAN is likely to eclipse the dual-GPU GTX 690, the answer is, most likely, no. A larger power budget and smooth SLI performance suggest GTX 690 will have a reasonable performance edge in well-optimised games, especially if frame-buffer limitations - GTX 690's GPUs each have access to 'only' 2GB - don't come into play. Putting it another way, two GeForce GTX 680s, in SLI, are likely to be quicker than single-card TITAN. What's more, they'll cost a little less, too.
If we forget about the TITAN's Tesla provenance for a moment, and turn a blind eye to the staggering $999 (£825) estimated price, it's not a reach to say that NVIDIA's fastest GPU feels like a GeForce GTX 580 whose Fermi guts have been replaced by energy-efficient Kepler. Should TITAN cost $999? Absolutely not, yet NVIDIA feels comfortable in doing so because of presumed performance hegemony.
Not quite a full-fat TITAN
It's clear that this is the fastest GeForce GTX single-GPU around, and it's also clear that most of the design has been harvested from the Tesla K20X. Understanding this salient point reveals that TITAN could have been even faster.
Here's a block diagram for the GK110 die powering both the TITAN and K20X. The complete architecture calls for 15 SMX units that are, as far as we can tell, very similar to the eight that power a GTX 680. Each SMX is home to 192 cores, 32 SFUs, 32 Load/Stores, 16 texture-units, one setup engine (PolyMorph 2.0), and four warp schedulers that can each dole out two 32-thread instructions per clock. Further, each GK110 SMX has 64KB of L1 cache - just like GK104-based GTX 680 - but TITAN also features the K20X's 48KB of L1 texture cache, which can be used to store data produced by compute applications. The entire chip is backed by 1,536KB of L2 cache, as well, up from GTX 680's 512KB.
All good thus far? And very much like a super-sized GTX 680. However, both TITAN and Tesla K20X don't use the full complement of SMX units. Rather, most likely for yield reasons, both top-end accelerators bake in 14 of the 15 SMXs, and this is how the 2,688 card-wide cores (14x192) are composed. A full-fat card would have 2,880 cores (15x192). But hey, what's an SMX between friends?
Going back to GTX 580, six 64-bit memory partitions constitute the 384-bit interface. 6GHz-rated GDDR5 memory is allied to these for the 288GB/s of bandwidth. Does a consumer graphics card need 6GB of onboard memory? No, not at all, but NVIDIA wants to keep the Tesla K20X to TITAN transformation as simple as possible. And heck, 6GB just sounds fast.
I'm a cheap Tesla, then?
The $999 asking price appears to be prohibitive to the gamer but can also be construed as a Tesla K20X on the (relative) cheap. This fact is further substantiated by the knowledge that, just like Tesla K20X, TITAN can run double-precision compute at 1/3rd of single-precision speeds, leading to over 1TFLOPS DP throughput. However, being a gamer's card at heart, TITAN's DP rate is set to 1/24th of SP, just like GTX 680, as no games use double-precision calculations. The full 1/3rd ratio can be set via the control panel, yet doing so forces the GPU's clocks down. And no gamer wants that, right?
NVIDIA ensures that those who are willing to pay for Tesla's feature-set continue to do so; TITAN doesn't feature the K20X's ECC memory, Hyper-Q and Grid Management Unit, amongst other $3,000-dollar niceties, though Dynamic Parallelism makes the cut.
A Tesla-to-TITAN makeover wouldn't be complete without NVIDIA experimenting with key technologies. First brought to market with the GTX 680, GPU Boost - where the GPU automatically overclocks when there's scope to do so - is refined and released in v2.0 form for this card. A sign of things to come, let's explore.