facebook rss twitter

Review: AMD's ATI Radeon HD 4850 and 4870: bloodying NVIDIA's profits

by Tarinder Sandhu on 25 June 2008, 11:56

Tags: AMD (NYSE:AMD), Sapphire, ATi Technologies (NYSE:AMD)

Quick Link: HEXUS.net/qanuu

Add to My Vault: x

Bringing them all together

Trotting out the comparison table

Graphics cards ATI Radeon HD 4870 512MiB ATI Radeon HD 4850 512MiB ATI Radeon HD 3850 ATI Radeon HD 3870 512 NVIDIA GeForce 9800 GTX+ 512 NVIDIA GeForce 9800 GTX 512 NVIDIA GeForce 8800 GTS 512 NVIDIA GeForce 8800 GT NVIDIA GeForce 9600 GT
PCIe PCIe 2.0
GPU clock 750MHz 625MHz 666MHz 775MHz 738MHz 675MHz 650MHz 600MHz 650MHz
Shader clock 750MHz 625MHz 666MHz 775MHz 1,836MHz 1,688MHz 1,625MHz 1,500MHz 1,625MHz
Memory clock (effective) 3,600MHz 2,000MHz 1,656MHz 2,250MHz 2,200MHz 2,200MHz 1,940MHz 1,800MHz 1,800MHz
Memory interface, and size, 256-bit, 512MiB, GDDR5 256-bit, 512MiB, GDDR3 256-bit, 512MiB, GDDR4 256-bit, 512MiB, GDDR3
Memory bandwidth 115GiB/sec 64GiB/sec 53GiB/sec 72.8GiB/sec 70.4GiB/sec 70.4GiB/sec 62.1GiB/sec 57.6GiB/sec 57.6GiB/sec
Manufacturing process TSMC, 55nm TSMC, 65nm
Transistor count 965M 965M 666M 666M 754M 754M 754M 754M 505M
Die size 260mm² 260mm² 192mm² 192mm² 230mm² 330mm² 330mm² 296mm² 240mm²
Double-precision support Yes Yes Yes Yes No No No No No
DirectX Shader Model DX10.1, 4.1 DX10, 4.0
Vertex, fragment, geometry shading (shared) 800 FP32 scalar ALUs, MADD dual-issue (unified) 800 FP32 scalar ALUs, MADD dual-issue (unified) 320 FP32 scalar ALUs, MADD dual-issue (unified) 320 FP32 scalar ALUs, MADD dual-issue (unified) 128 FP32 scalar ALUs, MADD dual-issue + MUL (unified) 128 FP32 scalar ALUs, MADD dual-issue + MUL (unified) 128 FP32 scalar ALUs, MADD dual-issue + MUL (unified) 112 FP32 scalar ALUs, MADD dual-issue + MUL (unified) 64 FP32 scalar ALUs, MADD dual-issue + MUL (unified)
Peak GFLOPS 1,200 1,000 426.2 496 470/705* 432/648* 416/624* 336/504* 208/312*
Data sampling and filtering 40ppc address and 40ppc bilinear INT8/20ppc FP16 filtering, max 16xAF 40ppc address and 40ppc bilinear INT8/ 20ppc FP16 filtering, max 16xAF 16ppc address and 16ppc bilinear INT8/FP16 filtering, max 16xAF 16ppc address and 16ppc bilinear INT8/FP16 filtering, max 16xAF 64ppc address and 64ppc bilinear INT8/32ppc FP16 filtering, max 16xAF 64ppc address and 64ppc bilinear INT8/32ppc FP16 filtering, max 16xAF 64ppc address and 64ppc bilinear INT8/32ppc FP16 filtering, max 16xAF 56ppc address and 56ppc bilinear INT8/28ppc FP16 filtering, max 16xAF 32ppc address and 32ppc bilinear INT8/16ppc FP16 filtering, max 16xAF
Peak fillrate Gpixels/s 12 10 10.656 12.4 11.8 10.8 10.4 9.6 10.4
Peak Gtexel/s (bilinear) 30 25 10.656 12.4 47.2 43.2 41.6 33.6 20.8
Peak Gtexel/s (FP16, bilinear) 15 12.5 10.656 12.4 23.6 21.6 20.8 16.8 10.4
ROPs 16 16 16 16 16 16 16 16 16
Peak TDP (claimed) 160 110 90 105 Unknown 155 140 105 95
Power connectors (default clocked) Two 6-pin 6-pin 6-pin 6-pin Two 6-pin Two 6-pin 6-pin 6-pin 6-pin
Multi-GPU CrossFire - four-board CrossFire - four-board CrossFire - four-board CrossFire - four-board SLI - three-board SLI - three-board SLI - two-board SLI - two-board SLI - two-board
Outputs 2 x dual-link DVI w/HDCP, HDMI 7.1 (native, on GPU) 2 x dual-link DVI w/HDCP, HDMI 5.1 (native, on GPU) 2 x dual-link DVI w/HDCP, native HDMI 5.1 (via S/PDIF)
Hardware-assisted video-decoding engine AMD UVD - full H.264 and VC-1 decode NVIDIA's PureVideo HD - full H.264 decode and partial VC-1 decode
Reference cooler dual-slot single-slot single-slot dual-slot dual-slot dual-slot dual-slot single-slot single-slot
Retail price (default-clocked model) £175 £125 £79 £89 £149** £129** £139 £99 £89


* calculated on a three FLOPS per clock cycle basis.

** based on NVIDIA's recent price-cuts. Current price is £175 for GTX and around £199 for overclocked GTX.

For a look at how the GeForce GTX 280, 260 and 9800 GX2 compare against the new ATI rivals, head on over to here.

Analysis

The nine-GPU table, above, takes in the real volume-selling SKUs from both companies. Priced at between £89 and £175 for default-clocked models, they constitute graphics-card updates that most can strive for, to play the latest games at reasonable resolutions and image-quality settings.

The Radeon HD 4850 uses 2GHz-rated GDDR3 for 64GiB/s of bandwidth. That's up from the HD 3870 but down from the HD 3870's GDDR4. ATI reckons that the HD 4850 has roughly the same level of usable bandwidth as the HD 3870, for the reasons outlined on the previous page.

We're a little concerned that the HD 4850 remains lopsided from a bandwidth point of view, appreciating just how 'top-heavy' the design is. Surely an architecture like this would thrive on 100GiB/s+

Segueing nicely, equipped with crazy-speed GDDR5, the Radeon HD 4870 manages to put out 115GiB/s of juicy bandwidth, which is comfortably more than any other card in the sub-£200 sector. 3.6Gbps memory does have its uses, after all. The number is particularly staggering considering the 256-bit interface.

Transistor count is up near 1bn, yet die-space is smaller than the 65nm-based GeForce 9800 GTX. We already know that the 800 SPs and 40 texturing units take a vast proportion of these near-1bn transistors up.

The new GPUs' vital stats don't begin to look really scary until we come down to the shader and texturing counts.

Both feature 800 SPs that can dual-issue arithmetic commands. Knowing the core clockspeeds of 625MHz and 750MHz for the HD 4850 and HD 4870, respectively, we arrive at peak ALU rate of 1.0 and 1.2TFLOPS - the latter being a figure that's almost twice as high as the GeForce 9800 GTX.

The greater texture units dictate that bilinear (INT8) texturing filtering is impressive, but FP16 texturing and general fillrate isn't quite as good, down to the 16ppc processing from the ROPs.

Both Radeon GPUs consume more under-load power than the cards they replace, and that's why the HD 4870 ships with a dual-slot-taking cooler and twin six-pin PCIe power connectors, allowing partners to ramp it up higher. The 110W HD 4850, however, keeps a single-slot profile and solitary power connector.

Most reference-like cards will ship with 512MiB of on-board memory, be it GDDR3 or GDDR5. Partners are free to design custom SKUs with, say, 1GiB of memory - useful in instances where lots of texturing and image-enhancement is taking place.

We also expect to see partners launch factory-overclocked cards from the get-go, opening up an avenue for product differentiation. Along the same lines, you'll see partner-designed coolers on certain models, too.

Summary

ATI has managed to fit an incredible amount of shading and texturing power into the new HD 48xx-series - more than we expected. That shading is helped along by a commensurate increase in texturing, and the use of GDDR5, on the Radeon HD 4870, means it has gobs of bandwidth, too.

If this was a specification-to-specification fight, it would be over before it started. ATI's new GPUs' visceral output cannot be matched by NVIDIA's mid-range, based on 18-month-old technology.

There's one thing in having a huge, huge engine, and another in being able to use it well. Historically, NVIDIA has enjoyed a huge advantage in ensuring that games developers optimise code for its architecture, through a better-supported dev-rel team. The upshot has been that any obvious shortfalls in on-paper specs have mitigated by tight, efficient code, much to the chagrin of ATI's engineers.

Whatever the current state of play, it's difficult to argue against the brute power of the new mid-range/enthusiast GPUs from ATI; they're comfortably ahead of anything else at the quoted price-points of £125 and £175 for the Radeon HD 4850 and HD 4870, respectively.