Tell me how it stacks up
Trotting out the comparison tableGraphics cards | NVIDIA GeForce GTX 280 1024 | NVIDIA GeForce GTX 260 896 | NVIDIA GeForce 9800 GX2 1024 | NVIDIA GeForce 9800 GTX 512 | NVIDIA GeForce 8800 GTS 512 | NVIDIA GeForce 8800 Ultra 768 | ATI Radeon HD 3870 X2 1024 | ATI Radeon HD 3870 512 |
---|---|---|---|---|---|---|---|---|
PCIe | PCIe 2.0 | PCIe 1.x | PCIe 2.0 | |||||
GPU clock | 602MHz | 576MHz | 600MHz | 675MHz | 650MHz | 612MHz | 825MHz | 775MHz |
Shader clock | 1,296MHz | 1,242MHz | 1,500MHz | 1,688MHz | 1,625MHz | 1,500MHz | 825MHz | 775MHz |
Memory clock (effective) | 2,214MHz | 1,998MHz | 2,000MHz | 2,200MHz | 1,940MHz | 2,160MHz | 1,802MHz | 2,250MHz |
Memory interface, size, and implementation | 512-bit, 1,024MiB, GDDR3 | 448-bit, 896MiB, GDDR3 | 2x 256-bit, 1,024MiB, GDDR3 | 256-bit, 512MiB, GDDR3 | 384-bit, 768MiB, GDDR3 | 2x 256-bit, 1,024MiB, GDDR3 | 256-bit, 512MiB, GDDR4 | |
Memory bandwidth | 141.7GiB/sec | 111.90GiB/sec | 128GiB/sec (card) | 70.40GiB/sec | 62.1GiB/sec | 103.68GiB/sec | 115.33GiB/sec (card) | 72.8GiB/sec |
Manufacturing process | TSMC, 65nm | TSMC, 90nm | TSMC, 55nm | |||||
Transistor count | 1,408M | 1,408M | 1,508M | 754M | 681M | 1,300M | 666M | |
Die size | Unknown (big) | Unknown (big) | 2x 296mm² | 330mm² | 484mm² | 2x 192mm² | 192mm² | |
DirectX Shader Model | DX10, 4.0 | DX10.1, 4.1 | ||||||
Vertex, fragment, geometry shading (shared) | 240 FP32 scalar ALUs, MADD dual-issue (unified) | 192 FP32 scalar ALUs, MADD dual-issue (unified) | 256 FP32 scalar ALUs, MADD dual-issue (unified) | 128 FP32 scalar ALUs, MADD dual-issue (unified) | 128 FP32 scalar ALUs, MADD dual-issue (unified) | 640 FP32 scalar ALUs, MADD dual-issue (unified) | 320 FP32 scalar ALUs, MADD dual-issue (unified) | |
Peak GFLOP/s | 933* | 715* | 768/1152* | 432/648* | 416/624* | 384/576* | 1,056 | 496 |
Data sampling and filtering | 80ppc address and
80ppc bilinear (8-bit integer)/40ppc FP16 filtering, max 16xAF |
64ppc address and 64ppc bilinear (8-bit integer)/32ppc FP16 filtering, max 16xAF | 128ppc address and 128ppc bilinear INT8/64ppc FP16 filtering, max 16xAF | 64ppc address and 64ppc bilinear INT8/32ppc FP16 filtering, max 16xAF | 64ppc address and 64ppc bilinear INT8/32ppc FP16 filtering, max 16xAF | 32ppc address and 32ppc bilinear INT8/32ppc FP16 filtering, max 16xAF | 32ppc address and 32ppc bilinear INT8/FP16 filtering, max 16xAF | 16ppc address and 16ppc bilinear INT8/FP16 filtering, max 16xAF |
Peak fillrate Gpixels/s | 19.264 | 16.128 | 19.2 | 10.8 | 10.4 | 14.688 | 26.4 | 12.4 |
Peak Gtexel/s (bilinear) | 48.16 | 36.864 | 76.8 | 43.2 | 41.6 | 19.584 | 26.4 | 12.4 |
Peak Gtexel/s (FP16, bilinear) | 24.09 | 18.432 | 38.4 | 21.6 | 20.8 | 19.584 | 26.4 | 12.4 |
ROPs | 32 | 28 | 32 | 16 | 16 | 24 | 32 | 16 |
Peak TDP (claimed) | 236 | 182 | 196 | 156 | 140 | 175 | 196 | 110 |
Power connectors (default clock) | 8-pin + 6-pin | 6-pin + 6-pin | 8-pin + 6-pin | 6-pin + 6-pin | 6-pin | 6-pin + 6-pin | 8-pin + 6-pin | 6-pin |
Multi-GPU | SLI - three-board | SLI - three-board | SLI - two-board | SLI - three-board | SLI - two-board | SLI - three-board | CrossFire - two-board | CrossFire - four-board |
Outputs | 2 x dual-link DVI w/HDCP, HDMI, mini-DIN | 2 x dual-link DVI w/HDCP, mini-DIN | 2 x dual-link DVI w/HDCP, HDMI | 2 x dual-link DVI w/HDCP, HDMI (native, on GPU) | 2 x dual-link DVI w/HDCP, mini-DIN | 2 x dual-link DVI w/HDCP (discrete ASIC), mini-DIN | 2 x dual-link DVI (HDMI) w/HDCP, mini-DIN (VIVO) | |
Hardware-assisted video-decoding engine | NVIDIA's PureVideo HD - full H.264 decode and partial VC-1 decode | NVIDIA PureVideo HD 1st gen | AMD UVD - full H.264 and VC-1 decode | |||||
Reference cooler | dual-slot | dual-slot | dual-slot | dual-slot | dual-slot | dual-slot | dual-slot | dual-slot |
Retail price (default-clocked model) | £449 | £299** | £299 | £185 | £145 | £299 (hard to find) | £239 | £89 |
* calculated on a three-FLOPS basis. Other GeForces are shown on two-
and three-FLOPs throughput.
** predicted pricing
GeForce GTX 260: you've
not told me about that yet
The 'budget'
next-generation offering is the GTX 260.
Based on GTX 280 (duh!), it's a case of tried-and-trusted snipping of
various counts. The table shows that it's clocked in lower on all
fronts, has fewer stream processors - 192 versus 240 - with eight
clusters versus
10. It also has one ROP partition removed, resulting in seven block of
four, or 28 in total.
Knowing the ROP lop-off, GTX 260 has a seven-channel memory interface,
made up of 64-bits. Add it together and you have a 448-bit interface
with 2GHz-rated memory. Calculator-time tells you that's potential
bandwidth in the region of 112GiB/s (448/8 x 1,998).
The financial implication is that it will be cheaper, by around
£150, we reckon.
The meat on the bones
The GPU clocks of 602/576MHz tie in with what NVIDIA has
pushed out in the last 18 months. Knowing that the GTX 280 can
bilinear-filter 80 ppc and bilinear FP16 (16-bit floating point) filter
40ppc, the peak Gpixel/s throughput is higher than any single GPU that
has gone before.
However, the twin-GPU GeForce 9800 GX2 handsomely beats out GTX
280 in both bilinear INT8 and FP16 throughput. Similarly, the twin-GPU
Radeon HD 3870 X2 FP16 throughput is higher, too.
Even so, a healthy dose of ROPs also keeps the GTX 280's fillrate the
highest of any
single GPU.
On the down side, the shader clock reduced from the 1,500MHz+ that
we're
accustomed to seeing of late. Assuming that we count the design as a
three-FLOPS issue, per clock, the peak GFLOPS rate is also good,
helped, no doubt, by the voluminous processors at hand. There's plenty
of shading horsepower that's allied to impressive fillrate and
acceptable 16-bit floating-point bilinear filtering, then.
GTX 280 uses a 512-bit memory interface that's paired with high-speed
GDDR3 RAM operating at an effective 2,214MHz. Adding it up, GTX 280
offers over 140GiB of juicy bandwidth. As we noted above, GTX 260's
bandwidth is reduced on two fronts: interface size and DRAM speed. This
results in around 112GiB/s. And both are higher than on any single GPU
released in volume.
Power and form
The big, meaty GPUs in the GTX 200-series are based on a
65nm
manufacturing process, unchanged from the newer 8-series and all
9-series GPUs.
That means heat, and lots of it. We've alluded to a
maximum TDP of 236W for the range-topping model. The GTX 260 fares a
little better, drawing up-to 182W - but even that is a high number.
We're getting into the realms where air cooling simply won't cut it. A
smaller manufacturing process reduces the TDP at the heat-related
expense
of a smaller die, leading to the need to shift significant wattage per
square mm. Obligatory liquid-based cooling isn't that far off, we
reckon.
NVIDIA has designed a revised dual-slot cooler to stop the beasties
getting too warm but we'd have preferred to have seen the GPUs on the
half-node 55nm process. Doubtless that will come later in the year.
Outputs
Nothing much has changed here, either. Unlike ATI with its
3000-series GPUs, there's no native HDMI or DisplayPort provision, and
both will be added, by AIBs, via separate ASICs. The video-processing
engine remains the same as 9-series,
too.
Price and launch SKUs
All the architecture talk in the world can be rendered
irrelevant by pricing. At time of writing, UK pricing for
default-clocked
cards gravitated towards £449 (ouch!), while US pricing is
around
$649 and Euro pricing €499.
As demand tends to outstrip supply in the first week of a new
architecture's release, we expect the pricing to drop around 10 per
cent within a month. Still, it's hugely expensive for a card with a
single GPU.
Looking at the immediate past, ATI has forsaken performance leadership
and concentrated on the value perspective. NVIDIA's big and hot GPU
almost needs to be expensive, to counter the die cost and engineering
that's gone into it.
What NVIDIA will be aiming to do, and most likely will do, is derive
lower-cost SKUs by whipping out the old architecture-busting hammer -
and get them to retail quickly before ATI is able to respond with its
next-generation mid-range part, RV770.
Basic summary
The specifications tell us that GeForce GTX 280 will be
the fastest single-GPU card around: no question, really. It has more
shading, fillrate, multitexturing,and memory bandwidth than any other
GPUs shown in the table, above.
What's interesting is that it may not be faster than the
fudged-together GeForce 9800 GX2 in certain circumstances that rely on
heavy texturing and that card is, in effect, based on a couple of GPUs
that were
available around 18 months ago.
NVIDIA is keen to push the GPGPU or parallel computing architecture
facets of the GTX 280/260, but don't let that fool you; all GeForce
8-series
and 9-series cards can run CUDA and accelerate non-gaming tasks.
Bigger, faster, wider, more power-hungry, NVIDIA's taken the
brute strength approach with the newest iteration
of GPUs. We had hoped for something a little more elegant, however.
On to the tootin', rootin' card now.