Decode
Decode
Avivo-supporting GPUs purport to have a programmable video decode engine paired with fixed function paths for decode of H.262, VC-1 and the about-to-be-very-venerable H.264 AVC. That's pretty much it in terms of how to explain the Decode step in Avivo, really.
Obviously the GPU takes input video, be that from the Encode stage (if needing to display it after encode, for real-time previews) or from the network (network transmission is pretty important, going forward into the future) or from disk (the same as decode from the network, but with better guarantee of the video data actually arriving) or whatever, and passes it through the fixed function hardware present for decode. Then the programmable graphics hardware and some programmable silicon inside the decoder can be used to support other video formats, or implement quality tweaks.
Something like the following image. The interconnects might not be as strict as one to the next, but the stages do exist in Avivo silicon for decoding video.
(mostly) Fixed-function decode for H.264
ATI's documentation for the fixed-function Decode stage in Avivo is thin on the ground. It mentions "comprehensive decode support" for the mentioned CODECs but that's about it. However their publicly available whitepaper on H.264 goes into a bit more detail. The decode stages for that video format are as follows.Reverse entropy means rebuilding the larger dataset created in the first encode stage outlined on the previous page. I mentioned CABAC on the previous page too, so a little explanation is worth it. Content adaptive binary arithmetic encoding (lossless compression basically), to give its full sexy name, allows higher compression rates than 'basic' H.264, but at a computational cost. CABAC works by analysing frame data to decide on the best compression scheme. Per-frame, that's a nice added cost in decode to get it done faster than real-time so you don't drop frames. R5-series GPUs from ATI have dedicated silicon for that.
iDCT is next. It's computationally cheap (pretty much the same step during decode as iDCT for MPEG-2/H.262) as a single function, but applied to H.264 it consumes more CPU cycles. Motion compensation is the most computationally expensive task in H.264 decode. It's not a fixed cost either, so you're doing varying amounts of analysis in your decoder in order to present the motion video. CPU burn, so get the GPU to help.
Finally, in-loop deblocking is the act of using prior frame data to help with deblocking the current frame you're working on. Stepping transition between video block bad, deblocking good. GPU help!