I seen the following explanantion for motion estimation / compensation for MPEG 1 and was just wondering is it correct:
Why dont we just code the raw difference between the current block and the reference block? Because the numbers for the residual are usually going to be a lot smaller. For example, say an object accelerates across the image. The x position in 11 frames was the following numbers. 12 16 20 25 31 38 48 59 72 84 96 The raw differences would be x 4 4 5 6 7 10 11 13 12 12 So the predicted values would be x x 20 24 30 37 45 58 70 85 96 So the residuals are x x 0 1 1 1 3 1 2 -1 0
Is the prediction for frame[i+1] = (frame[i] - frame[i-1]) + frame[i] i.e add the motion vector of previous two reference frames to the most recent reference frame? Then we encode the prediction residual, which is actual captured shot of frame[i+1] - prediction frame[i+1] and send this to the decoder?