Why is the Y-plane and the CbCr-plane reporting different sizes?

Question

CVPixelBufferLockBaseAddress(pixelBuffer, 0);

const size_t lumaPlaneIndex = 0;
size_t lumaPlaneWidth = CVPixelBufferGetWidthOfPlane(pixelBuffer, lumaPlaneIndex);
size_t lumaPlaneHeight = CVPixelBufferGetHeightOfPlane(pixelBuffer, lumaPlaneIndex);

const size_t cbcrPlaneIndex = 1;
size_t cbcrPlaneWidth = CVPixelBufferGetWidthOfPlane(pixelBuffer, cbcrPlaneIndex);
size_t cbcrPlaneHeight = CVPixelBufferGetHeightOfPlane(pixelBuffer, cbcrPlaneIndex);

NSLog(@"lumaPlaneWidth: %zu", lumaPlaneWidth);
NSLog(@"lumaPlaneHeight: %zu", lumaPlaneHeight);
NSLog(@"cbcrPlaneWidth: %zu", cbcrPlaneWidth);
NSLog(@"cbcrPlaneHeight: %zu", cbcrPlaneHeight);

CVPixelBufferUnlockBaseAddress(pixelBuffer, 0);

Output on my iPhone 5 running iOS 7 for the front facing camera is:

lumaPlaneWidth: 1280
lumaPlaneHeight: 720
cbcrPlaneWidth: 640
cbcrPlaneHeight: 360

The luma (Y) plane is twice the size of the CbCr plane, why is this?

1280x720 is 720p. Why are you expecting it to be half that size? Are you asking about Cb and Cr being sampled at a lower frequency? — Tommy, Aug 05 '14 at 22:07
@Tommy - Actually you might be right, I was confused as to why the planes had different dimensions, I might have incorrectly assumed that the luma plane was wrong. Ill re-word the questions. — Robert, Aug 06 '14 at 09:01

score 1 · Accepted Answer · answered Aug 06 '14 at 15:19

The human eye is much more sensitive to changes in brightness than to changes in colour. It can discern them at a higher frequency, so that information is usually stored at a higher sampling frequency. The motivation is simply the reality of human perception (plus, I guess, some consideration of bandwidth: you'd just capture as much as physically possible if data transmission were free).

The buffer you're getting has the Y (/brightness) channel sampled at four times the sampling rate of the Cb and Cr (/colour) channels. That's 4:1:1 chroma subsampling.

Furthermore, 99.9999% of digital cameras capture using a colour filter (almost always a Bayer filter specifically) which means they don't actually capture the full colour at every site but rather capture individual primary components at adjoining sites and then combine them mathematically. That problem gets non-trivial if you want a really good estimate of the true signal. If you're expecting someone only to need 4:1:1 then it's cheaper to demosaic directly to 4:1:1. That's why the API isn't giving you 4:4:4, regardless of it not knowing what you intend to do with the data.

Ah cool - thanks for your answer. I understood that there is more information in the luma plane, since the CR,CB planes are interleaved, something like: `YYYYYYYYYYYYBRBRBRBRBRBR` however, if the CrCB plan is 1/2 the size in each dimension then is 1/4th the overall area, and if only half of the plane is Cr and half Cb then isn't it `8:1:1`? ...Maybe Im missing something? — Robert, Aug 06 '14 at 21:20
Also I have just seen your answer here: http://stackoverflow.com/a/8838723/296446, If I understand it correctly, this accounts for the size in plane sizes by bit shifting the index like so: `uint8_t *cbCrBufferLine = &cbCrBuffer[(y >> 1) * cbCrPitch];` — Robert, Aug 06 '14 at 21:25

Why is the Y-plane and the CbCr-plane reporting different sizes?

1 Answers1