facebook rss twitter

Review: Twin-gun AMD R700 aims to blow NVIDIA out of the water

by Tarinder Sandhu on 14 July 2008, 18:41

Tags: ATI Radeon HD 4870 X2, GeForce GTX 280, AMD (NYSE:AMD), ATi Technologies (NYSE:AMD), NVIDIA (NASDAQ:NVDA)

Quick Link: HEXUS.net/qan3c

Add to My Vault: x

What is ATI Radeon HD 4870 X2, aka AMD R700?

We cannot emphasise enough that the AMD R700, which HEXUS has tested, is an early engineering sample.

AMD has not confirmed the final AMD R700 specification or disclosed final pricing to HEXUS.

The final specifications of AMD R700 may ultimately differ in detail from those we've reported.

Some of the AMD R700 specifications detailed below are a logical extrapolation of known data relating to ATI Radeon HD 4870 in CrossFire.

Furthermore, all measured performance is very likely to improve as the product, its BIOS and its device drivers are optimised to production status, the SKU launches and becomes available to purchase.

Any pricing estimations we have made are likely to be conservative.

HEXUS anticipates having confirmation of the final specification and pricing towards the end of July.

So what is the Radeon HD 4870 X2?

Graphics cards ATI Radeon HD 4870 X2 1024MiB ATI Radeon HD 4870 512MiB ATI Radeon HD 4850 512MiB ATI Radeon HD 3870 X2 1024 ATI Radeon HD 3870 512 NVIDIA GeForce GTX 280 NVIDIA GeForce GTX 260 NVIDIA GeForce 9800 GX2 NVIDIA GeForce 9800 GTX+ 512 NVIDIA GeForce 9800 GTX
PCIe PCIe 2.0
GPU(s) clock 750MHz 750MHz 625MHz 825MHz 775MHz 602MHz 576MHz 600MHz 738MHz 675MHz
Shader clock 750MHz 750MHz 625MHz 825MHz 775MHz 1,296MHz 1,242MHz 1,500MHz 1,836MHz 1,688MHz
Memory clock (effective) 3,600MHz 3,600MHz 2,000MHz 1,802MHz 2,250MHz 2,214MHz 2,000MHz 2,000MHz 2,200MHz 2,200MHz
Memory interface, and size, 512-bit (2x 256-bit), 1,024MiB, GDDR5 256-bit, 512MiB, GDDR5 256-bit, 512MiB, GDDR3 512-bit (2x 256-bit) 1,024MiB, GDDR3 256-bit, 512MiB, GDDR4 512-bit, 1,024MiB, GDDR3 448-bit, 896MiB, GDDR3 512-bit (2x 256-bit), 1,024MiB, GDDR3 256-bit, 512MiB, GDDR3 256-bit, 512MiB, GDDR3
Memory bandwidth 230GiB/sec 115GiB/sec 64GiB/sec 115.33GiB/sec 72.8GiB/sec 141.7GiB/sec 111.9GiB/sec 128GiB/sec 70.4GiB/sec 70.4GiB/sec
Manufacturing process TSMC, 55nm TSMC, 55nm TSMC, 55nm TSMC, 55nm TSMC, 55nm TSMC, 65nm TSMC, 65nm TSMC, 65nm TSMC, 55nm TSMC, 65nm
Transistor count 1,930M 965M 965M 1,300M 666M 1,408M 1,408M 1,508M 754M 754M
Die size 2x 260mm² 260mm² 260mm² 2x 192mm² 192mm² 576mm² 576mm² 2x 330mm² 230mm² 330mm²
Double-precision support Yes Yes Yes Yes Yes Yes Yes No No No
DirectX Shader Model DX10.1, 4.1 DX10.1, 4.1 DX10.1, 4.1 DX10.1, 4.1 DX10.1, 4.1 DX10, 4.0 DX10, 4.0 DX10, 4.0 DX10, 4.0 DX10, 4.0
Vertex, fragment, geometry shading (shared) 1,600 FP32 scalar ALUs, MADD dual-issue (unified) 800 FP32 scalar ALUs, MADD dual-issue (unified) 800 FP32 scalar ALUs, MADD dual-issue (unified) 640 FP32 scalar ALUs, MADD dual-issue (unified) 320 FP32 scalar ALUs, MADD dual-issue (unified) 240 FP32 scalar ALUs, MADD dual-issue + MUL (unified) 192 FP32 scalar ALUs, MADD dual-issue + MUL (unified) 128 FP32 scalar ALUs, MADD dual-issue + MUL (unified) 128 FP32 scalar ALUs, MADD dual-issue + MUL (unified) 128 FP32 scalar ALUs, MADD dual-issue + MUL (unified)
Peak GFLOPS 2,400 1,200 1,000 1,056 496 470/705* 933* 768/1,152* 715* 432/648*
Data sampling and filtering 80ppc address and 80ppc bilinear INT8/40ppc FP16 filtering, max 16xAF 40ppc address and 40ppc bilinear INT8/20ppc FP16 filtering, max 16xAF 40ppc address and 40ppc bilinear INT8/ 20ppc FP16 filtering, max 16xAF 32ppc address and 32ppc bilinear INT8/FP16 filtering, max 16xAF 16ppc address and 16ppc bilinear INT8/FP16 filtering, max 16xAF 80ppc address and 80ppc bilinear INT8/40ppc FP16 filtering, max 16xAF 64ppc address and 64ppc bilinear INT8/32ppc FP16 filtering, max 16xAF 128ppc address and 64ppc bilinear INT8/64ppc FP16 filtering, max 16xAF 64ppc address and 64ppc bilinear INT8/32ppc FP16 filtering, max 16xAF 64ppc address and 64ppc bilinear INT8/32ppc FP16 filtering, max 16xAF
Peak fillrate Gpixels/s 24 12 10 26.4 12.4 19.264 16.128 19.2 11.8 10.8
Peak Gtexel/s (bilinear) 60 30 25 26.4 12.4 48.16 36.864 76.8 47.2 43.2
Peak Gtexel/s (FP16, bilinear) 30 15 12.5 26.4 12.4 24.09 18.432 38.4 23.6 21.6
ROPs 32 16 16 32 16 32 28 32 16 16
Peak TDP (claimed) 320 160 110 196 105 236 182 196 Unknown 155
Power connectors (default clocked) 8-pin + 6-pin 6-pin + 6-pin 6-pin 8-pin + 6-pin 6-pin 8-pin + 6-pin 6-pin + 6-pin 8-pin + 6-pin 6-pin + 6-pin 6-pin + 6-pin
Multi-GPU CrossFire - two-board CrossFire - four-board CrossFire - four-board CrossFire - two-board CrossFire - four-board SLI - three-board SLI - three-board SLI - two-board SLI - three-board SLI - three-board
Outputs 2 x dual-link DVI w/HDCP, HDMI 7.1 (native, on GPU) 2 x dual-link DVI w/HDCP, HDMI 5.1 (native, on GPU) 2 x dual-link DVI w/HDCP, native HDMI 5.1 (via S/PDIF)
Hardware-assisted video-decoding engine AMD UVD - full H.264 and VC-1 decode NVIDIA's PureVideo HD - full H.264 decode and partial VC-1 decode
Reference cooler dual-slot dual-slot single-slot dual-slot dual-slot dual-slot dual-slot dual-slot dual-slot dual-slot
Retail price (default-clocked model) £349** £179 £125 £229 £89 £349 £219 £299 £149** £129


* calculated on a three FLOPS per clock cycle basis.

** estimated pricing

Performance conjecture

There is nothing intrinsically clever about what ATI is doing here. Much in the vein of the also-twin-GPU Radeon HD 3870 X2, the new card positions two of its fastest-clocked single-GPUs - HD 4870s - on to one card. The difference here is that the HD 4870 X2's frequencies are exactly the same speed as the regular cards', running at 750MHz core and shaders and 3,600MHz for the smokin' GDDR5 memory.

Each of the card's twin GPUs has access to its own 512MiB frame-buffer, of course, and both are connected via a PCIe 2.0 conduit that's mounted on to the PCB. The card-based numbers are extraordinary; 2.4TFLOPS of math calculation; 60Gtexels/s of bilinear filtering; and 230GiB/s of combined memory bandwidth - comfortably higher than any card that's come before.

It would not be unreasonable to assume that performance will be very much akin to two discrete Radeon HD 4870 boards placed in two-way CrossFire.

The inherent foible in using any multi-GPU card lies with potentially sub-standard software implementation. For example, should a game not have a CrossFire(X) profile programmed in to the drivers it is likely that 3D acceleration will only take place on a single GPU, leaving the other completely redundant.

Even if CrossFire is pre-programmed, which should be the case for the large proportion of games, multi-GPU speed-up may not approach the 2x that the specifications suggest, especially with DX10-based titles.

Is a 30 per cent or 40 per cent speed-up enough to justify the double the cost of a single card? That's one question that may well arise as we look at benchmark numbers.