Windows Vista: DirectX10 D3D Intro

Highlights of the D3D10 Programmable Graphics Pipe

The following list is by no means complete, rather it's a quick and dirty look over the highlights of D3D10.

Geometry Shader

As well as vertex and pixel shader stages, there's also a geometry shader (GS) stage in D3D10. Sitting between the VS and PS, pre-rasterisation, the GS can pretty much be thought of as a geometry amplifier (although it can de-amplify too), producing multiple new vertices from one input primitive from the VS.

Output vertex data can be of a different type than input, allowing a GS program to create completely new primitives from input data, with the same instruction set as VS and PS. That means the GS can texture, and as output it can also 'render' or stream out (a subset of data) into up to four new output buffers, as well as pass on the full results to the raster hardware and then PS.

That amp/de-amp and stream out ability, pre-rasterisation, is there to let developers (and therefore artists engaged in using that development for something better on screen) be more flexible and effecient with what happens to geometry, in a number of ways.

Think hardware accelerated tesselation (possibly a bad example for first-gen implementations of the GS in hardware, but bear with me since it's geometry amplification); the oft-quoted per-primitive, on-GPU material system; something like instanced animation of multiple meshes; or maybe an on-GPU particle system or GPU-based motion blurring, as just a few possible uses of the GS.

Essentially the GPU and developer have more hardware acceleration for vertex processing after initial VS work has taken place, without another pass through the VS after a round-trip back to the CPU.

Therefore the D3D10 pipe is something new entirely. IA->VS->GS (possible stream out)->raster->PS->OC->ROP. And that's without explaining the input assembler (IA), pre-ROP output combiner stage (including more render targets) and per-stage input/output semantics depending on what's active and doing what, as new things in D3D10.

The GS stage and stream out are arguably the most important, though, which is why we focussed on those.

Sample from anywhere, render to anywhere

Well, not quite anywhere, but the new Shader Model in D3D10 -- Shader Model 4.0 -- does allow read and write from more generalised data representations than D3D9 and its supported Shader Models. At any point in the programmable stages of the pipe, shader programs can sample data, and access to memory (read or write) can be done using data 'views'.

A view defines the areas in memory to work in, be they slices of render targets, 1D texture arrays, MIP chains and other well-defined resource types. Shader programs also have access to MSAA sample data (read and write), before the final colour write into the framebuffer, enabling MSAA-based effects and data use.

That means simply more ways to do I/O with useful data.

State change overhead fixed?

Now, in our quick and dirty overview of some of the key points for developers to consider when writing shader programs for the D3D10 pipe, we come to the fixing of the bane of a D3D9 developer's life.

Microsoft have significantly changed how you set render state and submit geometry batches and other data to the chip. Using a combination of driver model improvements only available on Vista and runtime changes, the runtime layer now spends less time on the CPU and more time making sure the render hardware has what it needs to draw pixels. Be they state blocks or general render data, they should now get to the hardware quicker and be cached there for longer, aiding performance (possibly significantly).

So, hopefully, just rendering a small batch of tris and swapping state more often isn't a hindrance to performance.

Bit/integer instructions

The new Shader Model, which defines the instruction set for the VS, GS and PS, defines a new integer-only instruction set that can be used for flow control, packing (including compression schemes) and unpacking data using the GPU, among other things.

Think C-like bitwise operations such as bit shifting and logic ops, within shader programs while working on integer data, allowing for more robust programming.

Be gone, caps bits, be gone!

"DX10 has no caps bits! No, wait, there's a couple for something I can't quite remember. Errr....". So went the musings of a dev-rel deity, one day. But for the most part, if we ignore the parts of DX10 that aren't D3D, he's absolutely correct, since to advertise yourself as a D3D10 device you must support all base features with almost no exceptions. No querying for available texture formats, or MSAA sample counts, or whether the hardware supports power-of-2 textures, or any number of the hundreds (literally) of device caps that exist in D3D9.

So developers can get on with writing code that does something useful, rather than code that has to navigate a minefield of supported this and not supported that. Any developer will tell you that the more time spent writing useful code, the better.

I can hear the cheers, just for that.

No more ASM shaders, just HLSL

With D3D10, you can't feed the shader core ASM shader programs. You must provide HLSL versions instead via a compiler interface found in the main D3D .dll. Optimisation opportunities for hardware are therefore entirely out of your hands, with the compiler and driver assembler doing all of that work.

For some developers this is actually a hindrance, but for the vast majority a complete higher-level representation without any means to poke and possibly break performance is just the ticket.

Support for IEEE-754 special numbers in SM4.0 shaders

A quick final point, but a developer must now handle the case where a value in a shader becomes infinity or NaN (not a number; think about a result like the square root of a negative number in floating point), or becomes a denormal (clamped to zero in D3D10 though, still). NaN and ∞ are valid in an SM4.0 shader.