What is ATI Radeon HD 4870 X2, aka AMD R700?
We cannot emphasise enough that the AMD R700,
which HEXUS has tested, is an early engineering sample.
AMD has not confirmed the final AMD R700 specification or disclosed
final pricing to HEXUS.
The final specifications of AMD R700 may ultimately differ in detail from those we've reported.
Some of the AMD R700 specifications detailed below are a
logical extrapolation of known data relating to ATI Radeon HD 4870 in
CrossFire.
Furthermore, all measured performance is very likely to improve as the
product, its BIOS and its device drivers are optimised to production
status, the SKU launches and becomes available to purchase.
Any pricing estimations we have made are likely to be conservative.
HEXUS anticipates having confirmation of the final specification and
pricing towards the end of July.
So what is the Radeon HD 4870 X2?
Graphics cards | ATI Radeon HD 4870 X2 1024MiB | ATI Radeon HD 4870 512MiB | ATI Radeon HD 4850 512MiB | ATI Radeon HD 3870 X2 1024 | ATI Radeon HD 3870 512 | NVIDIA GeForce GTX 280 | NVIDIA GeForce GTX 260 | NVIDIA GeForce 9800 GX2 | NVIDIA GeForce 9800 GTX+ 512 | NVIDIA GeForce 9800 GTX |
---|---|---|---|---|---|---|---|---|---|---|
PCIe | PCIe 2.0 | |||||||||
GPU(s) clock | 750MHz | 750MHz | 625MHz | 825MHz | 775MHz | 602MHz | 576MHz | 600MHz | 738MHz | 675MHz |
Shader clock | 750MHz | 750MHz | 625MHz | 825MHz | 775MHz | 1,296MHz | 1,242MHz | 1,500MHz | 1,836MHz | 1,688MHz |
Memory clock (effective) | 3,600MHz | 3,600MHz | 2,000MHz | 1,802MHz | 2,250MHz | 2,214MHz | 2,000MHz | 2,000MHz | 2,200MHz | 2,200MHz |
Memory interface, and size, | 512-bit (2x 256-bit), 1,024MiB, GDDR5 | 256-bit, 512MiB, GDDR5 | 256-bit, 512MiB, GDDR3 | 512-bit (2x 256-bit) 1,024MiB, GDDR3 | 256-bit, 512MiB, GDDR4 | 512-bit, 1,024MiB, GDDR3 | 448-bit, 896MiB, GDDR3 | 512-bit (2x 256-bit), 1,024MiB, GDDR3 | 256-bit, 512MiB, GDDR3 | 256-bit, 512MiB, GDDR3 |
Memory bandwidth | 230GiB/sec | 115GiB/sec | 64GiB/sec | 115.33GiB/sec | 72.8GiB/sec | 141.7GiB/sec | 111.9GiB/sec | 128GiB/sec | 70.4GiB/sec | 70.4GiB/sec |
Manufacturing process | TSMC, 55nm | TSMC, 55nm | TSMC, 55nm | TSMC, 55nm | TSMC, 55nm | TSMC, 65nm | TSMC, 65nm | TSMC, 65nm | TSMC, 55nm | TSMC, 65nm |
Transistor count | 1,930M | 965M | 965M | 1,300M | 666M | 1,408M | 1,408M | 1,508M | 754M | 754M |
Die size | 2x 260mm² | 260mm² | 260mm² | 2x 192mm² | 192mm² | 576mm² | 576mm² | 2x 330mm² | 230mm² | 330mm² |
Double-precision support | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | No | No |
DirectX Shader Model | DX10.1, 4.1 | DX10.1, 4.1 | DX10.1, 4.1 | DX10.1, 4.1 | DX10.1, 4.1 | DX10, 4.0 | DX10, 4.0 | DX10, 4.0 | DX10, 4.0 | DX10, 4.0 |
Vertex, fragment, geometry shading (shared) | 1,600 FP32 scalar ALUs, MADD dual-issue (unified) | 800 FP32 scalar ALUs, MADD dual-issue (unified) | 800 FP32 scalar ALUs, MADD dual-issue (unified) | 640 FP32 scalar ALUs, MADD dual-issue (unified) | 320 FP32 scalar ALUs, MADD dual-issue (unified) | 240 FP32 scalar ALUs, MADD dual-issue + MUL (unified) | 192 FP32 scalar ALUs, MADD dual-issue + MUL (unified) | 128 FP32 scalar ALUs, MADD dual-issue + MUL (unified) | 128 FP32 scalar ALUs, MADD dual-issue + MUL (unified) | 128 FP32 scalar ALUs, MADD dual-issue + MUL (unified) |
Peak GFLOPS | 2,400 | 1,200 | 1,000 | 1,056 | 496 | 470/705* | 933* | 768/1,152* | 715* | 432/648* |
Data sampling and filtering | 80ppc address and 80ppc bilinear INT8/40ppc FP16 filtering, max 16xAF | 40ppc address and 40ppc bilinear INT8/20ppc FP16 filtering, max 16xAF | 40ppc address and 40ppc bilinear INT8/ 20ppc FP16 filtering, max 16xAF | 32ppc address and 32ppc bilinear INT8/FP16 filtering, max 16xAF | 16ppc address and 16ppc bilinear INT8/FP16 filtering, max 16xAF | 80ppc address and 80ppc bilinear INT8/40ppc FP16 filtering, max 16xAF | 64ppc address and 64ppc bilinear INT8/32ppc FP16 filtering, max 16xAF | 128ppc address and 64ppc bilinear INT8/64ppc FP16 filtering, max 16xAF | 64ppc address and 64ppc bilinear INT8/32ppc FP16 filtering, max 16xAF | 64ppc address and 64ppc bilinear INT8/32ppc FP16 filtering, max 16xAF |
Peak fillrate Gpixels/s | 24 | 12 | 10 | 26.4 | 12.4 | 19.264 | 16.128 | 19.2 | 11.8 | 10.8 |
Peak Gtexel/s (bilinear) | 60 | 30 | 25 | 26.4 | 12.4 | 48.16 | 36.864 | 76.8 | 47.2 | 43.2 |
Peak Gtexel/s (FP16, bilinear) | 30 | 15 | 12.5 | 26.4 | 12.4 | 24.09 | 18.432 | 38.4 | 23.6 | 21.6 |
ROPs | 32 | 16 | 16 | 32 | 16 | 32 | 28 | 32 | 16 | 16 |
Peak TDP (claimed) | 320 | 160 | 110 | 196 | 105 | 236 | 182 | 196 | Unknown | 155 |
Power connectors (default clocked) | 8-pin + 6-pin | 6-pin + 6-pin | 6-pin | 8-pin + 6-pin | 6-pin | 8-pin + 6-pin | 6-pin + 6-pin | 8-pin + 6-pin | 6-pin + 6-pin | 6-pin + 6-pin |
Multi-GPU | CrossFire - two-board | CrossFire - four-board | CrossFire - four-board | CrossFire - two-board | CrossFire - four-board | SLI - three-board | SLI - three-board | SLI - two-board | SLI - three-board | SLI - three-board |
Outputs | 2 x dual-link DVI w/HDCP, HDMI 7.1 (native, on GPU) | 2 x dual-link DVI w/HDCP, HDMI 5.1 (native, on GPU) | 2 x dual-link DVI w/HDCP, native HDMI 5.1 (via S/PDIF) | |||||||
Hardware-assisted video-decoding engine | AMD UVD - full H.264 and VC-1 decode | NVIDIA's PureVideo HD - full H.264 decode and partial VC-1 decode | ||||||||
Reference cooler | dual-slot | dual-slot | single-slot | dual-slot | dual-slot | dual-slot | dual-slot | dual-slot | dual-slot | dual-slot |
Retail price (default-clocked model) | £349** | £179 | £125 | £229 | £89 | £349 | £219 | £299 | £149** | £129 |
* calculated on a three FLOPS per clock cycle basis.
** estimated pricing
Performance conjecture
There is nothing intrinsically clever about what ATI is doing here. Much in the vein of the also-twin-GPU Radeon HD 3870 X2, the new card positions two of its fastest-clocked single-GPUs - HD 4870s - on to one card. The difference here is that the HD 4870 X2's frequencies are exactly the same speed as the regular cards', running at 750MHz core and shaders and 3,600MHz for the smokin' GDDR5 memory.
Each of the card's twin GPUs has access to its own 512MiB frame-buffer, of course, and both are connected via a PCIe 2.0 conduit that's mounted on to the PCB. The card-based numbers are extraordinary; 2.4TFLOPS of math calculation; 60Gtexels/s of bilinear filtering; and 230GiB/s of combined memory bandwidth - comfortably higher than any card that's come before.
It would not be unreasonable to assume that performance will be very much akin to two discrete Radeon HD 4870 boards placed in two-way CrossFire.
The inherent foible in using any multi-GPU card lies with potentially sub-standard software implementation. For example, should a game not have a CrossFire(X) profile programmed in to the drivers it is likely that 3D acceleration will only take place on a single GPU, leaving the other completely redundant.
Even if CrossFire is pre-programmed, which should be the case for the large proportion of games, multi-GPU speed-up may not approach the 2x that the specifications suggest, especially with DX10-based titles.
Is a 30 per cent or 40 per cent speed-up enough to justify the
double
the cost of a single card? That's one question that may well arise as
we look at benchmark numbers.