For OSX and IOS, I have streams of real time encoded video (h.264) and audio (AAC) data coming in, and I want to be able to mux these together into an mp4.
I'm using an AVAssetWriter
to perform the muxing.
I have video working, but my audio still sounds like jumbled static. Here's what I'm trying right now (skipping some of the error checks here for brevity):
I initialize the writer:
NSURL *url = [NSURL fileURLWithPath:mContext->filename];
NSError* err = nil;
mContext->writer = [AVAssetWriter assetWriterWithURL:url fileType:AVFileTypeMPEG4 error:&err];
I initialize the audio input:
NSDictionary* settings;
AudioChannelLayout acl;
bzero(&acl, sizeof(acl));
acl.mChannelLayoutTag = kAudioChannelLayoutTag_Stereo;
settings = nil; // set output to nil so it becomes a pass-through
CMAudioFormatDescriptionRef audioFormatDesc = nil;
{
AudioStreamBasicDescription absd = {0};
absd.mSampleRate = mParameters.audioSampleRate; //known sample rate
absd.mFormatID = kAudioFormatMPEG4AAC;
absd.mFormatFlags = kMPEG4Object_AAC_Main;
CMAudioFormatDescriptionCreate(NULL, &absd, 0, NULL, 0, NULL, NULL, &audioFormatDesc);
}
mContext->aacWriterInput = [AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeAudio outputSettings:settings sourceFormatHint:audioFormatDesc];
mContext->aacWriterInput.expectsMediaDataInRealTime = YES;
[mContext->writer addInput:mContext->aacWriterInput];
And start the writer:
[mContext->writer startWriting];
[mContext->writer startSessionAtSourceTime:kCMTimeZero];
Then, I have a callback where I receive a packet with a timestamp (milliseconds), and a std::vector<uint8_t>
with the data containing 1024 compressed samples. I make sure isReadyForMoreMediaData
is true. Then, if this is our first time receiving the callback, I set up the CMAudioFormatDescription:
OSStatus error = 0;
AudioStreamBasicDescription streamDesc = {0};
streamDesc.mSampleRate = mParameters.audioSampleRate;
streamDesc.mFormatID = kAudioFormatMPEG4AAC;
streamDesc.mFormatFlags = kMPEG4Object_AAC_Main;
streamDesc.mChannelsPerFrame = 2; // always stereo for us
streamDesc.mBitsPerChannel = 0;
streamDesc.mBytesPerFrame = 0;
streamDesc.mFramesPerPacket = 1024; // Our AAC packets contain 1024 samples per frame
streamDesc.mBytesPerPacket = 0;
streamDesc.mReserved = 0;
AudioChannelLayout acl;
bzero(&acl, sizeof(acl));
acl.mChannelLayoutTag = kAudioChannelLayoutTag_Stereo;
error = CMAudioFormatDescriptionCreate(kCFAllocatorDefault, &streamDesc, sizeof(acl), &acl, 0, NULL, NULL, &mContext->audioFormat);
And finally, I create a CMSampleBufferRef
and send it along:
CMSampleBufferRef buffer = NULL;
CMBlockBufferRef blockBuffer;
CMBlockBufferCreateWithMemoryBlock(kCFAllocatorDefault, NULL, packet.data.size(), kCFAllocatorDefault, NULL, 0, packet.data.size(), kCMBlockBufferAssureMemoryNowFlag, &blockBuffer);
CMBlockBufferReplaceDataBytes((void*)packet.data.data(), blockBuffer, 0, packet.data.size());
CMTime duration = CMTimeMake(1024, mParameters.audioSampleRate);
CMTime pts = CMTimeMake(packet.timestamp, 1000);
CMSampleTimingInfo timing = {duration , pts, kCMTimeInvalid };
size_t sampleSizeArray[1] = {packet.data.size()};
error = CMSampleBufferCreate(kCFAllocatorDefault, blockBuffer, true, NULL, nullptr, mContext->audioFormat, 1, 1, &timing, 1, sampleSizeArray, &buffer);
// First input buffer must have an appropriate kCMSampleBufferAttachmentKey_TrimDurationAtStart since the codec has encoder delay'
if (mContext->firstAudioFrame)
{
CFDictionaryRef dict = NULL;
dict = CMTimeCopyAsDictionary(CMTimeMake(1024, 44100), kCFAllocatorDefault);
CMSetAttachment(buffer, kCMSampleBufferAttachmentKey_TrimDurationAtStart, dict, kCMAttachmentMode_ShouldNotPropagate);
// we must trim the start time on first audio frame...
mContext->firstAudioFrame = false;
}
CMSampleBufferMakeDataReady(buffer);
BOOL ret = [mContext->aacWriterInput appendSampleBuffer:buffer];
I guess the part I'm most suspicious of is my call to CMSampleBufferCreate. It seems I have to pass in a sample sizes array, otherwise I get this error message immediately when checking my writer's status:
Error Domain=AVFoundationErrorDomain Code=-11800 "The operation could not be completed" UserInfo={NSLocalizedFailureReason=An unknown error occurred (-12735), NSLocalizedDescription=The operation could not be completed, NSUnderlyingError=0x604001e50770 {Error Domain=NSOSStatusErrorDomain Code=-12735 "(null)"}}
Where the underlying error appears to be kCMSampleBufferError_BufferHasNoSampleSizes
.
I did notice an example in Apple's documentation for creating the buffer with AAC data: https://developer.apple.com/documentation/coremedia/1489723-cmsamplebuffercreate?language=objc
In their example, they specify a long sampleSizeArray with an entry for every single sample. Is that necessary? I don't have that information with this callback. And in our Windows implementation we didn't need that data. So I tried sending in packet.data.size() as the sample size but that doesn't seem right and it certainly doesn't produce pleasant audio.
Any ideas? Either tweaks to my calls here or different APIs I should be using to mux together streams of encoded data.
Thanks!