How does one obtain image buffers in their original format from a video stream using AVFoundation?

Question

In Apple's documentation for AVAssetReaderTrackOutput, it indicates the following about the parameter for outputSettings when instantiating an instance using +[AVAssetReaderTrackOutput assetReaderTrackOutputWithTrack:outputSettings:]

A value of nil configures the output to vend samples in their original format as stored by the specified track.

When using it on e.g. an MP4 video asset, it will seemingly step through frames in decode order (i.e. out-of-order with respect to display), however all queries to delivered CMSampleBufferRef objects using CMSampleBufferGetImageBuffer yields NULL CVImageBufferRef objects.

The only way I can ensure delivery of image buffer objects is to provide a pixel buffer format to outputSettings:, such as kCVPixelFormatType_32ARGB for the kCVPixelBufferPixelFormatTypeKey dictionary entry.

Another interesting side-effect of doing this, is that frames are then delivered in display order, and the underlying decode-order of frames is abstracted/hidden away.

Any ideas why this is so?

How do you know the frames are out of order? From the presentation timestamps? Are you sure they're frames? p.s. what do you mean by "original format"? — Rhythmic Fistman, May 05 '18 at 15:09
The frames appear to be out-of-order due to the presentation time stamps, yes. I'm uncertain if they contain frames, but they track what would be the decode order of frames otherwise, via e.g. other APIs like QuickTime. By original format, I was quoting what Apple's documentation indicates — and the idea is to avoid any unnecessary pixel format transformations between what was natural for the encoded stream to deliver, and what I end up having going forward. — Dan, May 06 '18 at 08:59
Maybe GOP B-frames could explain out of order timestamps? How out of order are we talking? https://en.wikipedia.org/wiki/Video_compression_picture_types — Rhythmic Fistman, May 06 '18 at 09:34

score 3 · Answer 1 · answered Jul 17 '19 at 20:56

Like you I expected that setting an outputSettings of nil would result in output of native format video frames but this is not the case, you must specify something in order to get a valid CVSampleBufferRef.

All is not lost, using a "barely there" dictionary seems to output frames in their native format,

AVAsset asset = [AVURLAsset URLAssetWithURL:inputURL options:nil];
AVAssetTrack *videoTrack = [[asset tracksWithMediaCharacteristic:AVMediaCharacteristicVisual] objectAtIndex:0];

NSDictionary *decompressionSettings =
     @{ (id)kCVPixelBufferIOSurfacePropertiesKey : [NSDictionary dictionary] };
AVAssetReaderTrackOutput trackOutput = [[AVAssetReaderTrackOutput alloc] initWithTrack:videoTrack outputSettings:decompressionSettings];
...

IOSurfaceOptions are simply default - further reading for reference: https://developer.apple.com/documentation/corevideo/kcvpixelbufferiosurfacepropertieskey?language=objc

How does one obtain image buffers in their original format from a video stream using AVFoundation?

1 Answers1