I'm wanting to read the audio out of a video file as fast as possible, using the libav libraries. It's all working fine, but it seems like it could be faster.
To get a performance baseline, I ran this ffmpeg command and timed it:
time ffmpeg -threads 1 -i file -map 0:a:0 -f null -
On a test file (a 2.5gb 2hr .MOV with pcm_s16be audio) this comes out to about 1.35 seconds on my M1 Macbook Pro.
On the other hand, this minimal C code (based on FFmpeg's "Demuxing and decoding" example) is consistently around 0.3 seconds slower.
#include <libavcodec/avcodec.h>
#include <libavformat/avformat.h>
static int decode_packet(AVCodecContext *dec, const AVPacket *pkt, AVFrame *frame)
{
int ret = 0;
// submit the packet to the decoder
ret = avcodec_send_packet(dec, pkt);
// get all the available frames from the decoder
while (ret >= 0) {
ret = avcodec_receive_frame(dec, frame);
av_frame_unref(frame);
}
return 0;
}
int main (int argc, char **argv)
{
int ret = 0;
AVFormatContext *fmt_ctx = NULL;
AVCodecContext *dec_ctx = NULL;
AVFrame *frame = NULL;
AVPacket *pkt = NULL;
if (argc != 3) {
exit(1);
}
int stream_idx = atoi(argv[2]);
/* open input file, and allocate format context */
avformat_open_input(&fmt_ctx, argv[1], NULL, NULL);
/* get the stream */
AVStream *st = fmt_ctx->streams[stream_idx];
/* find a decoder for the stream */
AVCodec *dec = avcodec_find_decoder(st->codecpar->codec_id);
/* allocate a codec context for the decoder */
dec_ctx = avcodec_alloc_context3(dec);
/* copy codec parameters from input stream to output codec context */
avcodec_parameters_to_context(dec_ctx, st->codecpar);
/* init the decoder */
avcodec_open2(dec_ctx, dec, NULL);
/* allocate frame and packet structs */
frame = av_frame_alloc();
pkt = av_packet_alloc();
/* read frames from the specified stream */
while (av_read_frame(fmt_ctx, pkt) >= 0) {
if (pkt->stream_index == stream_idx)
ret = decode_packet(dec_ctx, pkt, frame);
av_packet_unref(pkt);
if (ret < 0)
break;
}
/* flush the decoders */
decode_packet(dec_ctx, NULL, frame);
return ret < 0;
}
I tried measuring parts of this program to see if it was spending a lot of time in the setup, but it's not – at least 1.5 seconds of the runtime is the loop where it's reading frames.
So I took some flamegraph recordings (using cargo-flamegraph) and ran each a few times to make sure the timing was consistent. There's probably some overhead since both were consistently higher than running normally, but they still have the ~0.3 second delta.
# 1.812 total
time sudo flamegraph ./minimal file 1
# 1.542 total
time sudo flamegraph ffmpeg -threads 1 -i file -map 0:a:0 -f null - 2>&1
Here are the flamegraphs stacked up, scaled so that the faster one is only 85% as wide as the slower one. (click for larger)
The interesting thing that stands out to me is how long is spent on read
in the minimal example vs. ffmpeg:
The time spent on lseek
is also a lot longer in the minimal program – it's plainly visible in that flamegraph, but in the ffmpeg flamegraph, lseek
is a single pixel wide.
What's causing this discrepancy? Is ffmpeg actually doing less work than I think it is here? Is the minimal code doing something naive? Is there some buffering or other I/O optimizations that ffmpeg has enabled?
How can I shave 0.3 seconds off of the minimal example's runtime?