Muxing AAC audio and h.264 video streams to mp4 with AVFoundation

Question

For OSX and IOS, I have streams of real time encoded video (h.264) and audio (AAC) data coming in, and I want to be able to mux these together into an mp4.

I'm using an AVAssetWriterto perform the muxing.

I have video working, but my audio still sounds like jumbled static. Here's what I'm trying right now (skipping some of the error checks here for brevity):

I initialize the writer:

   NSURL *url = [NSURL fileURLWithPath:mContext->filename];
   NSError* err = nil;
   mContext->writer = [AVAssetWriter assetWriterWithURL:url fileType:AVFileTypeMPEG4 error:&err];

I initialize the audio input:

     NSDictionary* settings;
     AudioChannelLayout acl;
     bzero(&acl, sizeof(acl));
     acl.mChannelLayoutTag = kAudioChannelLayoutTag_Stereo;
     settings = nil; // set output to nil so it becomes a pass-through

     CMAudioFormatDescriptionRef audioFormatDesc = nil;
     {
        AudioStreamBasicDescription absd = {0};
        absd.mSampleRate = mParameters.audioSampleRate; //known sample rate
        absd.mFormatID = kAudioFormatMPEG4AAC;
        absd.mFormatFlags = kMPEG4Object_AAC_Main;
        CMAudioFormatDescriptionCreate(NULL, &absd, 0, NULL, 0, NULL, NULL, &audioFormatDesc);
     }

     mContext->aacWriterInput = [AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeAudio outputSettings:settings sourceFormatHint:audioFormatDesc];
     mContext->aacWriterInput.expectsMediaDataInRealTime = YES;
     [mContext->writer addInput:mContext->aacWriterInput];

And start the writer:

   [mContext->writer startWriting];
   [mContext->writer startSessionAtSourceTime:kCMTimeZero];

Then, I have a callback where I receive a packet with a timestamp (milliseconds), and a std::vector<uint8_t> with the data containing 1024 compressed samples. I make sure isReadyForMoreMediaData is true. Then, if this is our first time receiving the callback, I set up the CMAudioFormatDescription:

   OSStatus error = 0;

   AudioStreamBasicDescription streamDesc = {0};
   streamDesc.mSampleRate = mParameters.audioSampleRate;
   streamDesc.mFormatID = kAudioFormatMPEG4AAC;
   streamDesc.mFormatFlags = kMPEG4Object_AAC_Main;
   streamDesc.mChannelsPerFrame = 2;  // always stereo for us
   streamDesc.mBitsPerChannel = 0;
   streamDesc.mBytesPerFrame = 0;
   streamDesc.mFramesPerPacket = 1024; // Our AAC packets contain 1024 samples per frame
   streamDesc.mBytesPerPacket = 0;
   streamDesc.mReserved = 0;

   AudioChannelLayout acl;
   bzero(&acl, sizeof(acl));
   acl.mChannelLayoutTag = kAudioChannelLayoutTag_Stereo;
   error = CMAudioFormatDescriptionCreate(kCFAllocatorDefault, &streamDesc, sizeof(acl), &acl, 0, NULL, NULL, &mContext->audioFormat);

And finally, I create a CMSampleBufferRef and send it along:

   CMSampleBufferRef buffer = NULL;
   CMBlockBufferRef blockBuffer;
   CMBlockBufferCreateWithMemoryBlock(kCFAllocatorDefault, NULL, packet.data.size(), kCFAllocatorDefault, NULL, 0, packet.data.size(), kCMBlockBufferAssureMemoryNowFlag, &blockBuffer);
   CMBlockBufferReplaceDataBytes((void*)packet.data.data(), blockBuffer, 0, packet.data.size());

   CMTime duration = CMTimeMake(1024, mParameters.audioSampleRate);
   CMTime pts = CMTimeMake(packet.timestamp, 1000);
   CMSampleTimingInfo timing = {duration , pts, kCMTimeInvalid };

   size_t sampleSizeArray[1] = {packet.data.size()};

   error = CMSampleBufferCreate(kCFAllocatorDefault, blockBuffer, true, NULL, nullptr, mContext->audioFormat, 1, 1, &timing, 1, sampleSizeArray, &buffer);       

   // First input buffer must have an appropriate kCMSampleBufferAttachmentKey_TrimDurationAtStart since the codec has encoder delay'
   if (mContext->firstAudioFrame)
   {
      CFDictionaryRef dict = NULL;
      dict = CMTimeCopyAsDictionary(CMTimeMake(1024, 44100), kCFAllocatorDefault);
      CMSetAttachment(buffer, kCMSampleBufferAttachmentKey_TrimDurationAtStart, dict, kCMAttachmentMode_ShouldNotPropagate);
      // we must trim the start time on first audio frame...
      mContext->firstAudioFrame = false;
   }

   CMSampleBufferMakeDataReady(buffer);

   BOOL ret = [mContext->aacWriterInput appendSampleBuffer:buffer];

I guess the part I'm most suspicious of is my call to CMSampleBufferCreate. It seems I have to pass in a sample sizes array, otherwise I get this error message immediately when checking my writer's status:

Error Domain=AVFoundationErrorDomain Code=-11800 "The operation could not be completed" UserInfo={NSLocalizedFailureReason=An unknown error occurred (-12735), NSLocalizedDescription=The operation could not be completed, NSUnderlyingError=0x604001e50770 {Error Domain=NSOSStatusErrorDomain Code=-12735 "(null)"}}

Where the underlying error appears to be kCMSampleBufferError_BufferHasNoSampleSizes.

I did notice an example in Apple's documentation for creating the buffer with AAC data: https://developer.apple.com/documentation/coremedia/1489723-cmsamplebuffercreate?language=objc

In their example, they specify a long sampleSizeArray with an entry for every single sample. Is that necessary? I don't have that information with this callback. And in our Windows implementation we didn't need that data. So I tried sending in packet.data.size() as the sample size but that doesn't seem right and it certainly doesn't produce pleasant audio.

Any ideas? Either tweaks to my calls here or different APIs I should be using to mux together streams of encoded data.

Thanks!

I have no idea how to help, but this is a prime example of how to write your first question on stackoverflow! :D — Jonathan, May 02 '18 at 19:50

score 0 · Answer 1 · answered Nov 19 '18 at 21:22

If you don't want to transcode, do not pass the outputSetting dictionary. You should pass nil there: mContext->aacWriterInput = [AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeAudio outputSettings:nil sourceFormatHint:audioFormatDesc];

It is explained somewhere in this article: https://developer.apple.com/library/archive/documentation/AudioVideo/Conceptual/AVFoundationPG/Articles/05_Export.html

Muxing AAC audio and h.264 video streams to mp4 with AVFoundation

1 Answers1