What, if any, are the differences in MediaCodec encoders (I420, NV12, Planar, Semi-Planar, etc)?

Question

Referring to this page: http://bigflake.com/mediacodec/

A5. The color formats for the camera output and the MediaCodec encoder input are different. Camera supports YV12 (planar YUV 4:2:0) and NV21 (semi-planar YUV 4:2:0). The MediaCodec encoders support one or more of:

#19 COLOR_FormatYUV420Planar (I420)

#20 COLOR_FormatYUV420PackedPlanar (also I420)

#21 COLOR_FormatYUV420SemiPlanar (NV12)

#39 COLOR_FormatYUV420PackedSemiPlanar (also NV12)

#0x7f000100 COLOR_TI_FormatYUV420PackedSemiPlanar (also also NV12)

In my application, I am capturing frames from an external camera in YUY2 format, converting them to a usable format, and feeding them to a MediaMuxer.

Based on what I've read here, this means that I need to query what the device supports with MediaCodecInfo.CodeCapabilities. Then, based on that, do a conversion from YUY2 to the appropriate format. At least this is my understanding.

In the sea of codec formats, I am unsure of the differences with these and whether or not I need to account for all of them in my application. If so, I need to know the byte layout of these formats. I've filled in the ones I think are correct. Starting from the top and moving down:

FormatYUV420Planar - YYYY YYYY UU VV

FormatYUVPackedPlanar - ???

FormatYUV420SemiPlanar -- YYYY YYYY VU VU

FormatYUV420PackedSemiPlanar -- ???

COLOR_TI_FormatYUV420PackedSemiPlanar -- ???

One useful "reference" is the CTS test referenced from bigflake that generates frames in the various formats: https://android.googlesource.com/platform/cts/+/lollipop-release/tests/tests/media/src/android/media/cts/EncodeDecodeTest.java#971 . This must work on all devices, so if you do what it does you should be fine. (Bear in mind, though, that it only tests specific resolutions.) — fadden, Apr 14 '15 at 15:31
From the point of view of the CTS, which these devices must pass, FormatYUVPackedPlanar is equal to FormatYUV420Planar, and COLOR_TI_FormatYUV420PackedSemiPlanar is equal to FormatYUV420PackedSemiPlanar and FormatYUV420PackedSemiPlanar. (e.g. COLOR_TI_FormatYUV420PackedSemiPlanar has got some weird details only in the form that the decoder outputs it, but when feeding data to an encoder you can do it exactly as for FormatYUV420SemiPlanar). — mstorsjo, Apr 15 '15 at 05:51

Ganesh · Answer 1 · 2015-04-14T04:10:34.640

3

Packed formats are those formats in which all 3 components are packed together in one plane. Please refer to the following links for more information on the different color formats which are widely employed in visual pipelines.

For the COLOR_TI_FormatYUV420PackedSemiPlanar, I would recommend to refer the color conversion function in ColorConverteras here. It is similar to YUV420SemiPlanar, but has some specific differences in the way the data is picked up.

edited Apr 14 '15 at 04:10

answered Apr 14 '15 at 01:27

Ganesh

5,880
2
36
54

1

Yes, packed formats have all color formats in one plane, but these are named "PackedPlanar" and "PackedSemiPlanar". As far as I know, this naming is only for historical reasons; originally, if following the OMX spec in great detail, the normal Planar formats would be one buffer per plane, while PackedPlanar would be all three (or two for semiplanar) planes packed into one single buffer. In practice the naming is mostly confusing, and from the point of view of the CTS (which specifies what kind of input the encoders need to support), they are equivalent. – mstorsjo Apr 15 '15 at 05:53
1

Also, the extra handling specific to `COLOR_TI_FormatYUV420PackedSemiPlanar` in the color converter you linked is only relevant for interpreting data from the decoder. (This color format has got a nonzero offset, and a nonzero top/left cropping. You either ignore the offset and apply the cropping, or ignore the top/left cropping and reduce cropTop/2 lines from the offset to the chroma plane.) For input to the encoder one doesn't need to bother about these details at all, since one chooses the offset manually (and doesn't need to supply any extra padding for cropping). – mstorsjo Apr 15 '15 at 05:55
@mstorsjo Encoders typically support planar or semi-planar data. packed data is mainly coming from cameras which pack all 3 color planes into a stream of bytes to avoid buffering. More often than not, unless specifically designed to support such formats, encoders don't support packed data formats. `NV12` or `NV21` are the most common choices with Planar being the last. – Ganesh Apr 15 '15 at 13:48
For the TI format, the crop window arithmetic is a generic handling and I don't think it is part of the color format. This cropping based handling could be the same even for planar or semi-planar or any of the other formats discussed above. The actual color format lies in lines 459-463 where individual `luma` and `cb/cr` components are derived. An encoder has to factor in the crop window for the input frame/surface coming into it to pick up the right data. Else, it will land up encoding garbage data. Adhering to `OMX` crop syntax is of foremost importance unless one is using `metadata` format – Ganesh Apr 15 '15 at 13:54
Yes, an encoder needs to factor in the crop window, but conversely, the code calling an encoder doesn't need to think about how to handle a non-zero top/left cropping, because precisely that code itself is the one that sets the cropping, and can choose to set it to zero. – mstorsjo Apr 15 '15 at 15:16
1

Also, for the cropping, the thing that is TI specific is lines 454-455. See https://android-review.googlesource.com/62690 and https://android-review.googlesource.com/113405. To get the first pixel of the Y plane, you would normally use offset + cropTop * width + cropLeft, right? But for the TI pixel format, you either need to use only cropTop * width + cropLeft (which the CTS EncodeDecodeTest used to do in the last version that supported Galaxy Nexus), or use offset, ignore cropTop/cropLeft for the Y plane, and subtract cropTop/2 for the start of the UV plane (lines 454-455 in your link). – mstorsjo Apr 15 '15 at 15:20
It seems that CTS does treat them the same and works with all the color formats I listed above. The only thing it's checking is if the format is semi-planar or not. – Christopher Schneider Apr 15 '15 at 15:36
@mstorsjo.. For luma, I presume the calculation would be `ptr + croptop * width + cropleft`. I couldn't understand why another offset would be present. I acknowledge the fact that `UV` is picked up in a different way as compared to traditional color formats, which is expected as this is a custom color format. Moreover, even `luma` is picked up directly from `src.mBits` which means that `cropTop` and `cropLeft` should ideally be zero. I know understand your concern which is mainly due to the different handling of `luma` & `chroma` and no documentation for the same – Ganesh Apr 15 '15 at 16:22
1

In this particular case, `src.mBits` corresponds to `pBuffer + nOffset` from `OMX_BUFFERHEADERTYPE`. As you said, since luma is picked up directly from `src.mBits` without looking at `cropTop` and `cropLeft` - but the conflicting thing is that they aren't. Anyway, all of this is only necessary when interpreting such buffers from a TI decoder - when using TI encoders, set cropTop/cropLeft to zero (MediaCodec doesn't even allow passing these parameters for encoders), and it will behave just the same as the normal semiplanar formats (NV12). – mstorsjo Apr 15 '15 at 16:40

What, if any, are the differences in MediaCodec encoders (I420, NV12, Planar, Semi-Planar, etc)?

1 Answers1