facebook rss twitter

Review: The introduction of 512MiB consumer graphics hardware

by Ryszard Sommefeldt on 4 May 2005, 00:00

Quick Link: HEXUS.net/qabei

Add to My Vault: x

Are games going to start using more than 256MiB of basic data per frame?

Even assuming large reuse of texture data from a GPU's texture cache, there's still very easy scope for a game to need more than the current amount of framebuffer space, per frame. The issue today is that games developers aren't loading up the hardware with everything they possibly can. 128MiB is by far the pervasive memory configuration in consumer graphics today. But what if they did, and is it starting to happen?

Programmable graphics processors like ATI's R4-series and NVIDIA's GeForce 6-series have given developers and 3D graphics researchers the power and capability to come up with innovative new ways to construct a 3D scene, especially in terms of lighting and shadowing, in real-time with interactive framerates. There are numerous papers on the world wide web that discuss things like real-time radiosity, real-time global illumination (or decent approximations of it at least), shadow mapping, shadowing using spherical harmonics, shadowing using precomputed radiance transfer (PRT), which all require not only significants amounts of high-quality artist-generated data to look good, but all manner of intermediate data storage while you're building the frame to display.

Let's pick one easy to understand method of applying shadows to a 3D scene - perspective shadow mapping - which may be combined simultaneously per frame in a game engine to light the world, and examine data usage per frame, just for those shadowing methods, regardless of the other major art assets or anything else needed to shade the scene. This lets you see how even with changing data space per frame, costs just for the lighting model used are high.

A simple examination of the changing data space needed for shadow mapping

I talk about changing data space here since you're creating new shadow map data per frame displayed, and depending on the view frustrum, work done to create the shadow map can vary. In other words, it's not a fixed workload per frame and can be optimised. To understand that, think about why you'd calculate the light contribution for a frame for a light source that's fully occluded by both objects in the scene (it's blocked by a wall, say) and the viewing perspective (you can't physically see the light anyway so it's contribution to the frame's lighting is lessened). You only really want to use large resolution shadow maps for large area lights, such as the sun, with smaller maps useful for point or directional lights.

I have a second example for a future article that computes the per-frame cost for per-pixel PRT shadowing on a fairly complex scene (time permitting!).

Shadow Mapping Data Size Example


Before we delve into numbers, a little on how perspective shadow mapping works. Using deferred shading, where you accumulate contributions to the final frame in separate render targtets, before combining them for display, you render the scene from the perspective of the viewer and then again from the perspective of any light that you want to calculate shadow contribution from, into a high resolution, high-precision render target (2Kx2K or 4Kx4K for large view sizes such as 1600x1200 work best for area lights, at 32-bit). That's your depth map. You then sample at the intersection of the geometry in the scene from the view with the shadow map projected by the light. The result tells you whether you're in shadow at that point. When using a cube map, you render the scene onto the faces of the cube texture instead, to create your depth map.

So I setup a simple scene in Direct3D, using the algorithm from a Microsoft sample on shadow mapping and used a 2Kx2K shadow map for the sun and a smaller 512x512x6 cube map to hold the shadow maps (6 of them, one per face) for three small point lights (point lights are fixed). Geometry cost is less than 20K vertices per frame, and there's geometry to fully occlude the point lights. So it's just a simple example.

The data costs for the shadow maps are therefore relatively fixed. The 2Kx2K shadow map is a 16MiB texture, computed once per frame for the area light, then 1.5MiB each for the cube map for the point lights. Only if the engine finds the point light fully occluded does it not render the cube map contribution to shadowing. So the shadow map cost for my example wavers between 16 and 20.5MiB per frame. Add in more point lights that you might be rendering, unoccluded, and data cost equals 1.5MiB x point light count + 16MiB.

While that might not sound like a lot, when you then add in the cost of material data and anything like a normal map (often very large cost if you have a lot of normal-mapped geometry on screen), gloss map, environment map (can be a full-screen sized texture) or similar on top, which need to be in card memory for highest performance, using shadow mapping to render your scene, a technique that's gaining favour in many new game engines, starts to exert memory pressure on 256MiB cards. And remember my single area light and three fixed point lights is a fairly simple light setup.

Summary

Obviously this is just a very simple example, which doesn't take into account scaling the shadow map for resolution, reusing the shadow map per frame, using other shadowing algorithms at the same time (I alluded to using PRT and spherical harmonics above, which mostly have a shader cost, not a large data cost like shadow mapping), the number of passes taken to render (texture reuse) or render targets being combined (which are often the largest memory cost to factor in, if they're full-screen), but hopefully it lets you see how just one popular new method for shadowing imposes a fairly large fixed-cost penalty in terms of on-card data storage needed (over 10% of the framebuffer size for a simple area light and one point light in my small example). That's space that excludes use by art assets.

Many people associate the visible textures (materials) you see as the only thing a card stores in its memory, whereas I wanted to show you that's not the case at all in a modern 3D game title.

Therefore even with things like texture compression, possible compression of the matrices you use to calculate the shadow contribution if you're using PRT for example, normal map compression and geometry compression, future games will likely endeavour to contribute to a massive data cost per frame. Doom3 alludes to it now, depending on your settings, with future games cramming in larger art assets in order to boost visual quality in your games.

Even assuming that, due to normal mapping, geometry size costs aren't going to massively increase in games in the near future, texture data and the methods to render it are going to give you, in at least some titles and circumstances, significant memory pressure on a 256MiB board.

While you can argue that more research into compression (either software and hardware) would be money well spent, having the consumer fork out for a board with a larger on-card memory size is cheaper for the IHV, the developer and makes the board and chip vendors more money! Let's examine that particular cost, and related knock-on effects.

The upshot of 256MiB becoming the most pervasive memory size (whenever that happens) will be the introduction of the art assets and rendering techniques that will really tax that memory size and upwards, so upping resolution and AA won't be your only means to push a 256MiB board.