PCI Express Bandwidth Testing and the Video Processor
A quick chat with someone the other day gave me some clues as to the tools I'd need to test PCI Express bus performance on these new GPUs. A comment that the HSI interface used on NV45 to bridge the GPU's native AGP8X interface to PEG16X really does limit performance, especially when writing back to the PEG16X host (your mainboard's PEG16X controller and CPU subsystem) from the GPU, made it worth investigating.Serious Magic, a vendor of video editing software and other related tools, created a benchmark a couple of years ago that tested GPU-to-host writeback performance which back then was seriously limited by the graphics card's driver. It caused The Tech Report to investigate and write a famous article on driver performance when doing host writebacks. That same tool is a fine stress of the GPU-to-host interface today, even on PCI Express. Infact especially on PCI Express as you'll soon see.
TexBench 1.3 is used to test host-to-GPU texture uploads, to figure out what the bandwidth available is there. I use a quad-texture situation where the host uploads 8MB of 128x128 textures in burst mode, hopefully revealing any limitations in the bus bandwidth.
Host-to-GPU texture upload performance

All three bridged GPUs, that use NVIDIA's HSI (high speed interconnect) ASIC to convert the GPU's AGP8X native interface into PEG16X, sit at 266.67MB/sec. The native PCI Express GPU, NV43, shows a slight improvement on that limit but not much. It's a far cry from the theoretical maximum of PEG16X (4GB/sec in that particular direction) but it's correct for the GPUs used.
GPU-to-host texture download performance

The GPU-to-host writeback case, the traditional site of poor performance on AGP hardware, shows a much more pronounced difference. The 5900 PCX, which is NV35 bridged using the HSI, is slightly faster writing back to the PEG16X host than the NV45 boards by a couple of dozen MB/sec or so, or 7% in real terms.
However, the only native PCI Express GPU, NV43, is able to write back to the PEG16X host at a much higher rate, more than double that of its bridged siblings. It's some way off the theoretical peak of 4GB/sec (by a factor of 8), but it's a measurable two times delta in performance between a native PCI Express GPU and a bridged version. The HSI limits things, there's no doubt about it.
The Video Processor
The video processor on early NV40 and NV45 samples, both in retail and the reference boards shipped to reviewers, is broken. Fire up a HDTV video source using either GPU and watch CPU usage spike at near 100% utilisation. GPU-offload of video processing tasks, especially the decode assist needed for smooth playback of HDTV video, even on very fast CPUs, just isn't being performed. NVIDIA acknowledge that's the case with early NV40 (units that shouldn't have made it to retail) but insist it's working in retail samples and in NV45.My own personal testing indicates otherwise, both with all the reference boards I have (the full gamut on both PCI Express and AGP) and any retail samples that pass by me for review. CPU usage is very high, the VP isn't working.
So testing with NV43 was always on the cards. The test? Playing back the original HDTV T2 trailer in Windows Media Player 9, along with playback of two transcoded versions (XviD and MPEG2) at their native resolutions. Software was used to monitor CPU usage for just under two minutes of playback over the three high-res streams and an average taken of the WMV9 original source results.

It appears to be working correctly with NV43 with average CPU utilisation during playback less than 40%. The GPU is assisting the CPU in decoding the HDTV source video (in this case 1080p HDTV video) and the CPU is free to do other tasks at the same time.
It's clearly broken in NV40 and NV45, even on the 65.76 driver, with even NV35 offering more assistance than either of those parts.