The technical bit
Let's get one important naming issue out of the way first. ' R580+ ', as a specific name for the ASIC itself, doesn't exist. Look at the GPU of an X1950 XTX and you'll see plain old R580 laser-etched onto it. What we have, however, is a new revision to the silicon - A31 - to provide a number of tweaks. the name ' R580+ ' has been used internally (and occasionally externally) by ATi as an easy way to refer to this new product over its X1900 brethren.
Here's R580's spec, as taken from our Radeon X1900 XT and XTX review.
ATI R580 GPU Properties | |
---|---|
GPU | ATI R580 |
Process and Fabricator | 90nm @ TSMC, 90GT w/ low-k |
Die Size | 18.5mm x 18.5mm |
Transistor Count | 384 million |
DirectX Shader Model | Shader Model 3.0 |
Basic Configuration (VP/FP/ROP) | 8/48/16 |
Vertex Shader Info | VS3.0 (no texld) 5D FP32, co-issue MADD, branch single cycle trig functions |
Fragment Processor Info | PS3.0 4D FP32, dual-issue ADD+MADD, branch single cycle trig functions |
ROP Info | 6x Int8/FP16/FX16/Int10 MSAA (2 subsamples/cycle) 2x Z-only rate FP/FX16 blender |
Texture processing | 16 FP32 address units, 4 samplers Bilinear filter for integer samples |
Memory Interface | 256-bit, 512-bit ring bus GDDR->GDDR4 |
Display output | 2x dual-link DVI TMDS transmitters ATI Avivo |
GDDR4
With the new spin comes work to the R580's ring-bus memory controller. This is to improve its operation with GDDR4. The fourth version GDDR RAM was always a design consideration in the memory controller, but now that it's made it to mass production, ATI has taken an opportunity to ensure the core works optimally with it.
What's GDDR4 all about, then? Put simply, it's a higher bandwidth, lower power technology than GDDR3. Clocked at 1GHz, the X1950 XTX's 512MiB of GDDR4 can provide up to 64GiB/s of memory bandwidth. It sends (and receives) more data per cycle than GDDR3, allowing the core to be half-clocked. As such, the nominal core voltage for GDDR4 is lower than its predecessor, at 1.5V (though this is bumped up to 1.9V in gaming and overclocking scenarios). The result is less power consumption per clock compared to the older memory.
Also reducing power consumption is a technique known as Data Bus Inversion (DBI). Holding a logic-0 in memory demands more power than holding a logic-1. So ideally, it would be nice to store more 1s than 0s. With the help of DBI this is possible much of the time. When any 8-bit data segment is received by memory, if the number of 0s is greater than 4, the data is inverted and a DBI flag set to '1'. So, an 8-bit bit field with five 0s and three 1s becomes a field with five 1s and three 0s, lowering power consumption. Upon fetching that data back, if the DBI flag is set then the data is once again inverted, transforming it back to its original form.
GDDR4 chips use half the number of address pins as GDDR3. As such, addressing the memory takes two clocks rather than 1. The result is double the latency in addressing. The RAM also waits for a good clock strobe before it will put anything onto the data pins during a read operation. The end result can be a number of clock cycles passing before a clean clock strobe arrives, further increasing latency. Of course, it would be fair to call swings and roundabouts, because increased clock speeds will (although not always initially) negate the increased latency.
Currently Samsung's mass producing GDDR4 parts, with Hynix ramping it up into full steam as we speak. For now, you'll find Samsung's BC09 GDDR4 chips on the X1950 XTX in the same 512MiB quantity as we're accustomed to on such cards now.
Core clock
X1950 XTX's core clock remains the same as the X1900 XTX at 650MHz. However, the new silicon spin does seem to reduce power consumption slightly too, although we have no hard proof for you on that one just yet.
With the same clock speed then, it's down to the new memory and a tweaked memory controller to give the X1950 XTX the edge. We can expect scenarios where memory bandwidth is gobbled up to see a gain with the new product. This includes high resolutions and where anti-aliasing & anisotropic filtering are cranked up, or to put it another way, the kinds of scenarios you'd subject a card like this to.
Before we see if more girth really works, shall we take a look at the card?