A comprehensive definition of the H.264 Algorithm

Question

I have been reading scores of papers on the H.264 algorithm (see H.264 codec explained) and all of them make certain assumptions that make understanding the algorithm impossible, for example Alexander Herman's H.264/MPEG-4 Advanced Video Coding

The intra frame Prediction predicts the values of a block, by using previously decoded data in a frame.

But it doesn't explain what prediction is

Do we randomly pick a number?
Do we haphazardly copy previously predicted values?
Do we close our eyes and wait until a value comes to us?

Is there a good document out there that explains H.264 in detail?

score 5 · Accepted Answer · answered Oct 30 '15 at 20:06

"The H.264 Advanced Video Compression Standard" by Iain Richardson is the standard book. For full details the specification is available.

Each pixel is produced by combining a prediction with a residual.

In an Intra-frame the prediction for a square block of pixels is made by copying the pixels to the left or above that block. (Which pixels to copy are specified by bits in the bitstream - and in some modes the prediction is formed from a filtered version of the pixels instead of a straight copy.)

For the very first block in an image, there are no previously decoded pixels, so the prediction is set to value 128.

Once you have a prediction, a value (called the residual) is added to this to form the final value for the pixel (assuming deblocking is turned off). The value of the residual is contained in the bitstream (actually a transformed version of the residual as the transform means fewer bits are needed to encode the residual).

So, in summary, the bitstream first specifies a number which says which method to use to copy/filter previously decoded pixels to form a prediction, and another set of numbers which specify what value to add to this prediction to get the final pixels.

The aim is that the prediction is very close to the actual image so few bits need to be spent on the residual.

Yes, the prediction for all the pixels in the first block (e.g. the 4x4 pixels at the top-left of the image) will be 128, but then they will usually have a non-zero residual added to them, so the final pixel value can be more or less than 128 even for the first pixel in the image. — Peter de Rivaz, Oct 31 '15 at 20:04
What does "non zero residual" meant in this context? That is what I am trying to understand here, all the nuances that people gloss over. — puk, Nov 03 '15 at 15:34
The residual is computed from the bitstream. It is a signed number (e.g. +4 or -60) that tells you how much to add to the prediction to get the output pixel value. The computation to go from bitstream to residual is based on an integer transform that does simple add and subtract operations on a batch of 16 numbers. — Peter de Rivaz, Nov 03 '15 at 17:12

A comprehensive definition of the H.264 Algorithm

1 Answers1