1

I am trying to combine the following:

(a) : 29s video clip that has its own audio that lasts the entire duration

(b) : audio clip I want to play at the start of the video, in conjunction with original audio, and is ~2 seconds long

I successfully use 'amix' to obtain a video at the end with combined audio, but the problem is that the final video's audio cuts off at around 26 out of the 29 seconds of the video and goes silent.

What doesn't make any sense is that the resulting video plays as it should, with the audio successfully mixed. But the output video's audio stream loses the last 3 seconds.

Here's the 'amix' command I'm using (sending via subprocess):

subprocess.call(['ffmpeg','-i', input.mp4', '-i', "audioclip.mp3", '-filter_complex', 'amix', output.mp4'])

I've also used versions of this command that spell out the -map "0:a" and -map "1:a", or tried using 'amix=inputs=2:duration:longest' among many other additions. All lead to the same problem: the final combined video's audio drops out with 3 seconds remaining in the video, even though the initial 'input.mp4' video has a full 29 out of 29 seconds of audio.

Does anyone know why these last several seconds of audio from [a] are missing in the final video?

_________________________________________________________________

edit: Below is my output when I run the amix command listed above:

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'RuneBearinstakill_advanced.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf59.20.101
  Duration: 00:00:29.77, start: 0.000000, bitrate: 5441 kb/s
  Stream #0:0[0x1](eng): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt470bg/bt470bg/smpte170m, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 5304 kb/s, 30 fps, 30 tbr, 15360 tbn (default)
    Metadata:
      handler_name    : Bento4 Video Handler
      vendor_id       : [0][0][0][0]
  Stream #0:1[0x2](eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s (default)
    Metadata:
      handler_name    : Bento4 Sound Handler
      vendor_id       : [0][0][0][0]
[mp3 @ 000001f0c8ec2040] Estimating duration from bitrate, this may be inaccurate
Input #1, mp3, from 'TTS_clip.mp3':
  Duration: 00:00:01.90, start: 0.000000, bitrate: 32 kb/s
  Stream #1:0: Audio: mp3, 24000 Hz, mono, fltp, 32 kb/s
Stream mapping:
  Stream #0:1 (aac) -> amix (graph 0)
  Stream #1:0 (mp3float) -> amix (graph 0)
  amix:default (graph 0) -> Stream #0:0 (aac)
  Stream #0:0 -> #0:1 (h264 (native) -> h264 (libx264))
Press [q] to stop, [?] for help
[libx264 @ 000001f0c8cbe5c0] using SAR=1/1
[libx264 @ 000001f0c8cbe5c0] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 @ 000001f0c8cbe5c0] profile High, level 4.0, 4:2:0, 8-bit
[libx264 @ 000001f0c8cbe5c0] 264 - core 164 r3094 bfc87b7 - H.264/MPEG-4 AVC codec - Copyleft 2003-2022 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=24 lookahead_threads=4 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'RuneBearinstakill_advancedwithtts.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf59.20.101
  Stream #0:0: Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s
    Metadata:
      encoder         : Lavc59.25.100 aac
  Stream #0:1(eng): Video: h264 (avc1 / 0x31637661), yuv420p(tv, bt470bg/bt470bg/smpte170m, progressive), 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 30 fps, 15360 tbn (default)
    Metadata:
      handler_name    : Bento4 Video Handler
      vendor_id       : [0][0][0][0]
      encoder         : Lavc59.25.100 libx264
    Side data:
      cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A
frame=  893 fps=110 q=-1.0 Lsize=   18717kB time=00:00:29.66 bitrate=5168.5kbits/s speed=3.66x    
video:18256kB audio:433kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.150179%
[aac @ 000001f0c8f9ebc0] Qavg: 921.259
[libx264 @ 000001f0c8cbe5c0] frame I:4     Avg QP:21.33  size: 71366
[libx264 @ 000001f0c8cbe5c0] frame P:633   Avg QP:23.32  size: 23837
[libx264 @ 000001f0c8cbe5c0] frame B:256   Avg QP:25.22  size: 12968
[libx264 @ 000001f0c8cbe5c0] consecutive B-frames: 57.2% 10.3% 10.1% 22.4%
[libx264 @ 000001f0c8cbe5c0] mb I  I16..4: 17.9% 71.4% 10.8%
[libx264 @ 000001f0c8cbe5c0] mb P  I16..4:  6.9% 17.6%  0.8%  P16..4: 43.1%  6.5%  1.5%  0.0%  0.0%    skip:23.6%
[libx264 @ 000001f0c8cbe5c0] mb B  I16..4:  1.5%  4.2%  0.3%  B16..8: 39.7%  4.6%  0.5%  direct: 1.6%  skip:47.6%  L0:55.9% L1:41.8% BI: 2.3%
[libx264 @ 000001f0c8cbe5c0] 8x8 transform intra:69.5% inter:87.3%
[libx264 @ 000001f0c8cbe5c0] coded y,uvDC,uvAC intra: 35.6% 26.8% 0.8% inter: 13.4% 10.8% 0.0%
[libx264 @ 000001f0c8cbe5c0] i16 v,h,dc,p: 21% 37% 12% 30%
[libx264 @ 000001f0c8cbe5c0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 25% 26% 21%  4%  5%  5%  6%  4%  5%
[libx264 @ 000001f0c8cbe5c0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 24% 28% 15%  5%  7%  7%  7%  5%  4%
[libx264 @ 000001f0c8cbe5c0] i8c dc,h,v,p: 67% 18% 14%  1%
[libx264 @ 000001f0c8cbe5c0] Weighted P-Frames: Y:0.2% UV:0.0%
[libx264 @ 000001f0c8cbe5c0] ref P L0: 72.3% 15.4%  8.7%  3.6%  0.0%
[libx264 @ 000001f0c8cbe5c0] ref B L0: 88.9%  9.5%  1.6%
[libx264 @ 000001f0c8cbe5c0] ref B L1: 97.7%  2.3%
[libx264 @ 000001f0c8cbe5c0] kb/s:5024.13

And here is the output when I check the stream durations for the input video and the output video, showing how the output video's audio stream is somehow reduced by several seconds after the amix:

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'RuneBearinstakill_advanced.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf59.20.101
  Duration: 00:00:29.77, start: 0.000000, bitrate: 5403 kb/s
  Stream #0:0[0x1](eng): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt470bg/bt470bg/smpte170m, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 5266 kb/s, 30 fps, 30 tbr, 15360 tbn (default)
    Metadata:
      handler_name    : Bento4 Video Handler
      vendor_id       : [0][0][0][0]
  Stream #0:1[0x2](eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s (default)
    Metadata:
      handler_name    : Bento4 Sound Handler
      vendor_id       : [0][0][0][0]
[STREAM]
duration=29.766667
[/STREAM]
[STREAM]
duration=29.738000
[/STREAM]

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'RuneBearinstakill_advancedwithtts.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf59.20.101
  Duration: 00:00:29.77, start: 0.000000, bitrate: 5098 kb/s
  Stream #0:0[0x1](und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
      vendor_id       : [0][0][0][0]
  Stream #0:1[0x2](eng): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt470bg/bt470bg/smpte170m, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 4971 kb/s, 30 fps, 30 tbr, 15360 tbn (default)
    Metadata:
      handler_name    : Bento4 Video Handler
      vendor_id       : [0][0][0][0]
[STREAM]
duration=27.477000
[/STREAM]
[STREAM]
duration=29.766667
kilika
  • 31
  • 6
  • Could you add the initial log (input thru output listings)? Also was there any warning messages popping up during transcoding? – kesh Apr 08 '22 at 03:09
  • Thank you for the feedback. I just posted the output above. I didn't see any warnings aside from "[mp3 @ 000001f0c8ec2040] Estimating duration from bitrate, this may be inaccurate" on the .mp3 file. Is that relevant? – kilika Apr 08 '22 at 07:46
  • I don't think so (unless 1.9s it's reporting is incorrect). I don't see anything wrong with the log. A couple things you can try: (1) run `ffprobe -show_entries stream=duration FILE` on input and output video files involved and make sure audio stream durations match (or short in case of the output) (2) try adding `aresample=48k` to mp3 input to explicitly match the sampling rate before mixing (doubt this would fix it) (3) Try divide-n-conquer to trouble shoot. e.g., (a) transcode video w/out mixing (b) mix audio to an audio file etc. Break down the process until you find which step is failing. – kesh Apr 08 '22 at 13:44
  • @kesh I did (1) and it did show that the final output video's audio stream is in fact shorter by ~2 seconds than the video stream (27 vs 29 seconds). In the input video, the audio stream is identical to the video stream, both at 29 seconds. I'm still really confused why these 2 seconds of audio stream are being chopped off by `amix.` Will try the rest of your suggestions. – kilika Apr 12 '22 at 22:17
  • I found the fix. It turns out I needed to set the input video's audio stream to `aresample=aysnc=1`. Ultimately my amix command looked like this: `'[0:a]aresample=async=1[0a];[1:a]volume=2.0[1a];[0a][1a]amix=inputs=2'` – kilika Apr 12 '22 at 23:16

1 Answers1

2

I found the fix. It turned out I needed to set the input video's audio stream to aresample=1=async in the filter_complex for the amix command.

aresample=aysnc=1

Ultimately my amix command looked like this:

'[0:a]aresample=async=1[0a];[1:a]volume=2.0[1a];[0a][1a]amix=inputs=2'

I found this kind of solution from a similar question over at superuser : https://superuser.com/questions/1234493/ffmpeg-amix-audio-to-video-with-some-audio-in-parts

kilika
  • 31
  • 6
  • 1
    The answer over at superuser describes the reasoning for doing this as: "My guess is your audio stream isn't continuous and only has packets corresponding to content. Pad in the missing samples in the video's audio" – kilika Apr 12 '22 at 23:21