6

I seen the following explanantion for motion estimation / compensation for MPEG 1 and was just wondering is it correct:

Why dont we just code the raw difference between the current block and the reference block? Because the numbers for the residual are usually going to be a lot smaller. For example, say an object accelerates across the image. The x position in 11 frames was the following numbers. 12 16 20 25 31 38 48 59 72 84 96 The raw differences would be x 4 4 5 6 7 10 11 13 12 12 So the predicted values would be x x 20 24 30 37 45 58 70 85 96 So the residuals are x x 0 1 1 1 3 1 2 -1 0

Is the prediction for frame[i+1] = (frame[i] - frame[i-1]) + frame[i] i.e add the motion vector of previous two reference frames to the most recent reference frame? Then we encode the prediction residual, which is actual captured shot of frame[i+1] - prediction frame[i+1] and send this to the decoder?

user915071
  • 175
  • 1
  • 1
  • 6
  • I would advise to read this answer of mine to clear most of your doubts: http://dsp.stackexchange.com/questions/986/how-do-the-motion-vectors-work-in-predictive-coding-for-mpeg/1023#1023 Once you read this - you can refine your question. – Dipan Mehta Jan 22 '12 at 13:29
  • hi dipan..to be honest ive seen answers like that before.. My question is more fundamental..All i want to know is, is the prediction for frame[i+1] predicted as i outlined in my first post. Or is the frame predicted as follows: prediction frame[i+1] = (frame[i+1]-frame[i]) + frame[i] i.e is the motion vector for the current frame computed using the previous 2 reference frames. Or is it computed using the current frame and the previous reference frame?? Thanks – user915071 Jan 22 '12 at 16:30
  • `frame[i+1] = (frame[i] - frame[i-1]) + frame[i]` is wrong! Please *read* my answer it will answer whether your premise `actual captured shot of frame[i+1] - prediction frame[i+1]` is correct. – Dipan Mehta Jan 23 '12 at 02:06

1 Answers1

3

MPEG1 decoding (motion compensation) works like this:

The predictions and motion vectors turn a reference frame into the next (current) frame. Here's how you would calculate each pixel of the new frame:

For each macroblock, you have a set of predicted values (differences from reference frame). The motion vector is a value relative to the reference frame.

// Each luma and chroma block are 8x8 pixels
    for(y=0; y<8; y++)
    {
       for (x=0; x<8; x++)
       {
          NewPixel(x,y) = Prediction(x,y) + RefPixel(x+motion_vector_x, y+motion_vector_y)
       }
    }

With MPEG1 you have I, P and B frames. I frames are completely intra coded (e.g. similar to JPEG), with no references to other frames. P frames are coded with predictions from the previous frame (either I or P). B frames are coded with predictions from both directions (previous and next frame). The B frame processing makes the video player a little more complicated because it may reference the next frame, therefore each frame has a sequence number and B frames will cause the sequence to be non-linear. In other words, your video decoder needs to hold on to potentially 3 frames while decoding a stream (previous, current and next).

BitBank
  • 8,500
  • 3
  • 28
  • 46