I am multiplexing video and audio streams. Video stream comes from generated image data. The audio stream comes from aac file. Some audio files are longer than total video time I set so my strategy to stop audio stream muxer when its time becomes larger than the total video time(the last one I control by number encoded video frames).
I won't put here the whole setup code, but it is similar to muxing.c example from the latest FFMPEG repo. The only difference is that I use audio stream from file,as I said, not from synthetically generated encoded frame. I am pretty sure the issue is in my wrong sync during muxer loop.Here is what I do:
void AudioSetup(const char* audioInFileName)
{
AVOutputFormat* outputF = mOutputFormatContext->oformat;
auto audioCodecId = outputF->audio_codec;
if (audioCodecId == AV_CODEC_ID_NONE) {
return false;
}
audio_codec = avcodec_find_encoder(audioCodecId);
avformat_open_input(&mInputAudioFormatContext,
audioInFileName, 0, 0);
avformat_find_stream_info(mInputAudioFormatContext, 0);
av_dump_format(mInputAudioFormatContext, 0, audioInFileName, 0);
for (size_t i = 0; i < mInputAudioFormatContext->nb_streams; i++) {
if (mInputAudioFormatContext->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_AUDIO) {
inAudioStream = mInputAudioFormatContext->streams[i];
AVCodecParameters *in_codecpar = inAudioStream->codecpar;
mAudioOutStream.st = avformat_new_stream(mOutputFormatContext, NULL);
mAudioOutStream.st->id = mOutputFormatContext->nb_streams - 1;
AVCodecContext* c = avcodec_alloc_context3(audio_codec);
mAudioOutStream.enc = c;
c->sample_fmt = audio_codec->sample_fmts[0];
avcodec_parameters_to_context(c, inAudioStream->codecpar);
//copyparams from input to autput audio stream:
avcodec_parameters_copy(mAudioOutStream.st->codecpar, inAudioStream->codecpar);
mAudioOutStream.st->time_base.num = 1;
mAudioOutStream.st->time_base.den = c->sample_rate;
c->time_base = mAudioOutStream.st->time_base;
if (mOutputFormatContext->oformat->flags & AVFMT_GLOBALHEADER) {
c->flags |= CODEC_FLAG_GLOBAL_HEADER;
}
break;
}
}
}
void Encode()
{
int cc = av_compare_ts(mVideoOutStream.next_pts, mVideoOutStream.enc->time_base,
mAudioOutStream.next_pts, mAudioOutStream.enc->time_base);
if (mAudioOutStream.st == NULL || cc <= 0) {
uint8_t* data = GetYUVFrame();//returns ready video YUV frame to work with
int ret = 0;
AVPacket pkt = { 0 };
av_init_packet(&pkt);
pkt.size = packet->dataSize;
pkt.data = data;
const int64_t duration = av_rescale_q(1, mVideoOutStream.enc->time_base, mVideoOutStream.st->time_base);
pkt.duration = duration;
pkt.pts = mVideoOutStream.next_pts;
pkt.dts = mVideoOutStream.next_pts;
mVideoOutStream.next_pts += duration;
pkt.stream_index = mVideoOutStream.st->index;
ret = av_interleaved_write_frame(mOutputFormatContext, &pkt);
} else
if(audio_time < video_time) {
//5 - duration of video in seconds
AVRational r = { 60, 1 };
auto cmp= av_compare_ts(mAudioOutStream.next_pts, mAudioOutStream.enc->time_base, 5, r);
if (cmp >= 0) {
mAudioOutStream.next_pts = (int64_t)std::numeric_limits<int64_t>::max();
return true; //don't mux audio anymore
}
AVPacket a_pkt = { 0 };
av_init_packet(&a_pkt);
int ret = 0;
ret = av_read_frame(mInputAudioFormatContext, &a_pkt);
//if audio file is shorter than stop muxing when at the end of the file
if (ret == AVERROR_EOF) {
mAudioOutStream.next_pts = (int64_t)std::numeric_limits<int64_t>::max();
return true;
}
a_pkt.stream_index = mAudioOutStream.st->index;
av_packet_rescale_ts(&a_pkt, inAudioStream->time_base, mAudioOutStream.st->time_base);
mAudioOutStream.next_pts += a_pkt.pts;
ret = av_interleaved_write_frame(mOutputFormatContext, &a_pkt);
}
}
Now, the video part is flawless. But if the audio track is longer than video duration, I am getting total video length longer by around 5% - 20%, and it is clear that audio is contributing to that as video frames are finished exactly where there're supposed to be.
The closest 'hack' I came with is this part:
AVRational r = { 60 ,1 };
auto cmp= av_compare_ts(mAudioOutStream.next_pts, mAudioOutStream.enc->time_base, 5, r);
if (cmp >= 0) {
mAudioOutStream.next_pts = (int64_t)std::numeric_limits<int64_t>::max();
return true;
}
Here I was trying to compare next_pts
of the audio stream with the total time set for video file,which is 5 seconds. By setting r = {60,1}
I am converting those seconds by the time_base of the audio stream. At least that's what I believe I am doing. With this hack, I am getting very small deviation from the correct movie length when using standard AAC files,that's sample rate of 44100,stereo. But if I test with more problematic samples,like AAC sample rate 16000,mono - then the video file adds almost a whole second to its size.
I will appreciate if someone can point out what I am doing wrong here.
Important note: I don't set duration on for any of the contexts. I control the termination of the muxing session, which is based on video frames count.The audio input stream has duration, of course, but it doesn't help me as video duration is what defines the movie length.
UPDATE:
This is second bounty attempt.
UPDATE 2:
Actually,my audio timestamp of {den,num} was wrong,while {1,1} is indeed the way to go,as explained by the answer. What was preventing it from working was a bug in this line (my bad):
mAudioOutStream.next_pts += a_pkt.pts;
Which must be:
mAudioOutStream.next_pts = a_pkt.pts;
The bug resulted in exponential increment of pts,which caused very early reach to the end of stream (in terms of pts) and therefore caused the audio stream to be terminated much earlier than it supposed to be.