When I stack two videos using vstack, the result has audio sync issues for the bottom video.
My starting point: four separate RTP tracks captured from a 2 person video chat:
Actor1Video.webm,
Actor1Audio.webm,
Actor2Video.webm,
Actor2Audio.webm
I use vstack to put Actor1 on top and Actor2 on bottom:
ffmpeg -i Actor1Video.webm -i Actor2Video.webm -i Actor1Audio.webm -i Actor2Audio.webm -filter_complex "[1][0]scale2ref=oh*mdar:ih[2nd][ref];[ref][2nd]vstack=inputs=2[v];[2:a][3:a]join=inputs=2:channel_layout=stereo:map=0.0-FL|1.0-FR[a]" -c:a libfdk_aac -map "[v]" -map "[a]" -vsync 2 ActorsCombined.mp4
Here's the log:
ffmpeg version git-2021-02-08-89f78dd Copyright (c) 2000-2021 the FFmpeg developers
built with Apple clang version 11.0.3 (clang-1103.0.32.62)
configuration: --prefix=/usr/local/Cellar/ffmpeg/HEAD-89f78dd_6 --enable-shared --cc=clang --host-cflags= --host-ldflags= --enable-gpl --enable-libaom --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-libsnappy --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-demuxer=dash --disable-libjack --disable-indev=jack --enable-opencl --enable-videotoolbox --disable-htmlpages --enable-libfdk-aac --enable-nonfree
libavutil 56. 64.100 / 56. 64.100
libavcodec 58.121.100 / 58.121.100
libavformat 58. 67.100 / 58. 67.100
libavdevice 58. 11.103 / 58. 11.103
libavfilter 7.103.100 / 7.103.100
libswscale 5. 8.100 / 5. 8.100
libswresample 3. 8.100 / 3. 8.100
libpostproc 55. 8.100 / 55. 8.100
Input #0, matroska,webm, from 'Actor1Video.webm':
Metadata:
title : FFmpeg
ENCODER : Lavf58.29.100
Duration: 447576:28:17.41, start: 1611273978.135000, bitrate: N/A
Stream #0:0: Video: vp8, yuv420p(tv, bt470bg/unknown/unknown, progressive), 1280x720, SAR 1:1 DAR 16:9, 29.97 fps, 29.97 tbr, 1k tbn, 1k tbc (default)
Metadata:
DURATION : 447576:28:17.408999
Input #1, matroska,webm, from 'Actor2Video.webm':
Metadata:
title : FFmpeg
ENCODER : Lavf58.29.100
Duration: 447576:28:17.45, start: 1611273978.257000, bitrate: N/A
Stream #1:0: Video: vp8, yuv420p(tv, bt470bg/unknown/unknown, progressive), 320x180, SAR 1:1 DAR 16:9, 29.97 fps, 29.97 tbr, 1k tbn, 1k tbc (default)
Metadata:
DURATION : 447576:28:17.453999
Input #2, matroska,webm, from 'Actor1Audio.webm':
Metadata:
title : FFmpeg
ENCODER : Lavf58.29.100
Duration: 447576:28:17.49, start: 1611273978.112000, bitrate: N/A
Stream #2:0: Audio: opus, 48000 Hz, stereo, fltp (default)
Metadata:
DURATION : 447576:28:17.492000
Input #3, matroska,webm, from 'Actor2Audio.webm':
Metadata:
title : FFmpeg
ENCODER : Lavf58.29.100
Duration: 447576:28:17.45, start: 1611273978.208000, bitrate: N/A
Stream #3:0: Audio: opus, 48000 Hz, stereo, fltp (default)
Metadata:
DURATION : 447576:28:17.447999
File 'ActorsCombined.mp4' already exists. Overwrite? [y/N] y
Stream mapping:
Stream #0:0 (vp8) -> scale2ref:ref
Stream #1:0 (vp8) -> scale2ref:default
Stream #2:0 (opus) -> join:input0
Stream #3:0 (opus) -> join:input1
vstack -> Stream #0:0 (libx264)
join -> Stream #0:1 (libfdk_aac)
Press [q] to stop, [?] for help
[libx264 @ 0x7ff0c1831a00] using SAR=1/1
[libx264 @ 0x7ff0c1831a00] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 @ 0x7ff0c1831a00] profile High, level 4.0, 4:2:0, 8-bit
[libx264 @ 0x7ff0c1831a00] 264 - core 161 r3043 59c0609 - H.264/MPEG-4 AVC codec - Copyleft 2003-2021 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=12 lookahead_threads=2 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'ActorsCombined.mp4':
Metadata:
title : FFmpeg
encoder : Lavf58.67.100
Stream #0:0: Video: h264 (avc1 / 0x31637661), yuv420p(progressive), 1280x1440 [SAR 1:1 DAR 8:9], q=2-31, 29.97 fps, 30k tbn (default)
Metadata:
encoder : Lavc58.121.100 libx264
Side data:
cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A
Stream #0:1: Audio: aac (mp4a / 0x6134706D), 48000 Hz, stereo, s16, 139 kb/s (default)
Metadata:
encoder : Lavc58.121.100 libfdk_aac
frame=36626 fps= 15 q=-1.0 Lsize= 389420kB time=00:21:59.38 bitrate=2417.9kbits/s dup=0 drop=34791 speed=0.535x
video:365641kB audio:22446kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.343645%
[libx264 @ 0x7ff0c1831a00] frame I:158 Avg QP:15.51 size:107833
[libx264 @ 0x7ff0c1831a00] frame P:9670 Avg QP:18.71 size: 25824
[libx264 @ 0x7ff0c1831a00] frame B:26798 Avg QP:24.90 size: 4018
[libx264 @ 0x7ff0c1831a00] consecutive B-frames: 0.6% 5.2% 0.6% 93.5%
[libx264 @ 0x7ff0c1831a00] mb I I16..4: 13.2% 75.5% 11.3%
[libx264 @ 0x7ff0c1831a00] mb P I16..4: 1.2% 3.6% 0.2% P16..4: 43.1% 10.4% 5.9% 0.0% 0.0% skip:35.6%
[libx264 @ 0x7ff0c1831a00] mb B I16..4: 0.1% 0.1% 0.0% B16..8: 28.3% 0.7% 0.1% direct: 2.3% skip:68.5% L0:45.1% L1:53.6% BI: 1.3%
[libx264 @ 0x7ff0c1831a00] 8x8 transform intra:71.6% inter:85.4%
[libx264 @ 0x7ff0c1831a00] coded y,uvDC,uvAC intra: 50.4% 77.2% 47.8% inter: 6.9% 17.0% 3.8%
[libx264 @ 0x7ff0c1831a00] i16 v,h,dc,p: 37% 28% 14% 22%
[libx264 @ 0x7ff0c1831a00] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 25% 17% 25% 4% 6% 7% 5% 6% 5%
[libx264 @ 0x7ff0c1831a00] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 35% 24% 16% 4% 6% 5% 4% 4% 2%
[libx264 @ 0x7ff0c1831a00] i8c dc,h,v,p: 60% 16% 17% 6%
[libx264 @ 0x7ff0c1831a00] Weighted P-Frames: Y:0.0% UV:0.0%
[libx264 @ 0x7ff0c1831a00] ref P L0: 63.1% 9.9% 20.5% 6.6%
[libx264 @ 0x7ff0c1831a00] ref B L0: 90.0% 8.9% 1.1%
[libx264 @ 0x7ff0c1831a00] ref B L1: 94.7% 5.3%
[libx264 @ 0x7ff0c1831a00] kb/s:2270.36
The resulting file begins in sync, but after a few minutes the bottom video is suddenly out of sync with its audio.
The strange thing is, if I merge these videos with their audio separately, without using vstack, there's no sync issue:
ffmpeg -i Actor1Video.webm -i Actor1Audio.webm -vsync 2 Actor1.mp4 &&
ffmpeg -i Actor2Video.webm -i Actor2Audio.webm -vsync 2 Actor2.mp4
When I do the above, the two videos are perfectly in sync. But if I take these two mp4s and stack them, I have the same issue where the bottom video goes out of sync.
Any suggestions?
UPDATE
This question does not appear to be a duplicate of anything on this site (though, as @llogan noted, other users have had issues with WebRTC timestamps). It seems unlikely, though, that WebRTC recordings are simply impossible to sync?