Problem understanding audio stream number of samples when decoded with ffmpeg

Question

The two streams I am decoding are an audio stream (adts AAC, 1 channel, 44100, 8-bit, 128bps) and a video stream (H264) which are received in an Mpeg-Ts stream, but I noticed something that doesn't make sense to me when I decode the AAC audio frames and try to line up the audio/video stream timestamps. I'm decoding the PTS for each video and audio frame, however I only get a PTS in the audio stream every 7 frames.

When I decode a single audio frame I get back 1024 samples, always. The frame rate is 30fps, so I see 30 frames each with 1024 samples which comes equals 30,720 samples and not the expected 44,100 samples. This is a problem when computing the timeline as the timestamps on the frames are slightly different between the audio and video streams. It's very close, but since I compute the timestamps via (1024 samples * 1,000 / 44,100 * 10,000 ticks) it's never going to line up exactly with the 30fps video.

Am I doing something wrong here with decoding the ffmpeg audio frames, or misunderstanding audio samples? And in my particular application, these timestamps are critical as I am trying to line up LTC timestamps which are decoded at the audio frame level, and lining those up with video frames.

FFProbe.exe:

Video:
r_frame_rate=30/1      
avg_frame_rate=30/1    
codec_time_base=1/60
time_base=1/90000      
start_pts=7560698279   
start_time=84007.758656

Audio:
r_frame_rate=0/0
avg_frame_rate=0/0
codec_time_base=1/44100
time_base=1/90000
start_pts=7560686278
start_time=84007.625311

I think the real solution will be utilizing the PesHeader.Pts timestamps from the Mpeg-Ts stream, though I don't fully understand how to utilize it yet since it doesn't exist at the frame level. But my real misunderstanding is with why 30 audio frames * samples != 44100. — Michael Brown, Jan 21 '21 at 00:42
Audio framerate != video framerate. As you noticed, decoded AAC frames have 1024 samples so audio framerate = sample_rate/1024.and audio frame duration is its reciprocal. — Gyan, Jan 21 '21 at 04:29
so audio frames are coming in at 43.07fps and video is 30fps, but if I measure the time between 30 audio frames it's exactly 1.0s. The reason I'm trying to fully understand this is I'm decoding LTC timecode in the audio frames, and trying to sync that up with the video frames. — Michael Brown, Jan 21 '21 at 20:37
If you run the ashowinfo filter on a few seconds of data, you'll see the cadence of audio timestamps. — Gyan, Jan 22 '21 at 05:21
Thanks for that tip, I wasn't aware of that filter. I'm slowly getting a better understanding for how the audio streams work, and right now I'm trying to utilize PTS timestamps to line it up with the video frames approximately. It's somewhat weird though that there is so much jitter in the PTS timestamps of the audio, they aren't exactly spaced out unlike the video frames and only present in every 7th audio ts packet. — Michael Brown, Jan 23 '21 at 00:31
@Gyan is there some kind of trick to get ashowinfo filter to work with a udp stream in ffprobe? I'm not having much luck here. — Michael Brown, Jan 23 '21 at 01:15
Couldn't find a way to get ashowinfo to work with UDP streams `ffprobe -f lavfi "amovie='udp\:\/\/@238.0.0.1:1234',ashowinfo" -show_frames` seems to fail to connect, not sure if there is other syntax or not. I did get it to work with a ts file using `ffprobe -f lavfi "amovie='udpdata.ts',ashowinfo" -show_frames >out 2>outerr` — Michael Brown, Jan 23 '21 at 01:34

Problem understanding audio stream number of samples when decoded with ffmpeg

0 Answers0