I have a pipeline which is intended to capture audio and video from a C920 camera, do a little very simple processing on it (low cpu requirements), then recompress it and mux it to file.
This is a general outline of the pipeline:
Platform:
- Raspberry Pi 3
- Debian Jessie
- GStreamer 1.8
Don't worry about my 'simple processing' area. My overall CPU sits below 25% CPU.
What I find, is that Q3 and Q4 slowly start filling up, until one hits a threshold and then my audio goes all choppy (and I get warnings from alsasrc 'downstreaming is not consuming buffers fast enough'). I can put leaks on the queues, but that's hardly resolving the issue.
As my pipeline is running, this is what my queues look like (current-level-time in ms)
QUEUE CONTENTS IN MILLISECONDS
TIME(s) Q1 Q2 Q3 Q4 Q5 Q6
0 0 0 0 0 0 0
5 0 0 252 380 0 0
10 0 0 293 460 0 0
15 0 0 332 470 0 0
20 0 0 378 451 0 0
25 0 0 333 460 0 0
30 0 0 383 480 0 0
35 0 0 500 550 0 0
40 0 0 500 610 0 0
45 0 0 539 630 0 0
50 0 0 584 670 0 0
=== EXPERIMENT ===
I removed the yellow leg of the pipeline, so that I was only capturing video, and the result was better. I had no queues which just kept 'growing' - and the output video was perfect.
QUEUE CONTENTS IN MILLISECONDS
TIME(s) Q1 Q2 Q3 Q4 Q5 Q6
0 0 0 0 0 0 0
5 0 0 2 0 0 0
10 0 0 5 0 0 0
15 0 0 8 0 0 0
20 0 0 8 0 0 0
25 0 0 8 0 0 0
30 0 0 8 0 0 0
35 0 0 8 0 0 0
40 0 0 8 0 0 0
45 0 0 8 0 0 0
50 0 0 8 0 0 0
Also, I tried the following pipeline (I have omitted the queues from the diagram), with complete success - video recorded for at least 10 minutes with no issues.
=== THE QUESTION ===
What is going on?
My guess is that because Q3 (the video output) is filling up, then the audio must be slowing things down. Because Q4 is filling up, and NOT Q5 - that must mean that alsa is producing audio more quickly than the aac encoder can compress it - is that correct? However, my CPU usage is very low - I've tried with 2 aac encoders (voaacenc, and avenc_aac), and an MP3 encoder, all with the same issue.
======== UPDATE =========
I've put a couple of identity elements after the audio, and video (directly after), and charted the PTS of their outputs. You can see that they very quickly start drifting away from each other. By the time the video is at 30s, the audio is well behind at 21 seconds. Here is a chart
======== UPDATE 2 =========
I had a second camera, and swapped that over, and the problem went away. The audio and video PTS values stayed in sync for at least 25 minutes. The difference with this new camera - is it's a modified C920, with a custom lens fitted. The lens coincidentally happened to be pulled completely out of focus - and that is what fixed the PTS drift (if I focus the custom lens, I get the same PTS drift).
So - the question has changed a little: Why does an in-focus C920 camera drift its PTS so badly? Note: I am turning off auto-exposure, and setting the exposure-absolute value to the default of 250. I would prefer to be able to use the auto-exposure however...