Superimposing two videos onto a static image?

Question

I have two videos that I'd like to combine into a single video, in which both videos would sit on top of a static background image. (Think something like this.) My requirements are that the software I use is free, that it runs on OSX, and that I don't have to re-encode my videos an excessive number of times. I'd also like to be able to perform this operation from the command line or via script, since I'll be doing it a lot. (But this isn't strictly necessary.)

I tried fiddling with ffmpeg for a couple of hours, but it just doesn't seem very well suited for post-processing. I could potentially hack something together via the overlay feature, but so far I haven't figured out how to do it, aside from pain-stakingly converting the image to a video (which takes 2x as long as the length of my videos!) and then superimposing the two videos onto it in another rendering step.

Any tips? Thank you!

Update:

Thanks to LordNeckbeard's help, I was able to achieve my desired result with a single ffmpeg call! Unfortunately, encoding is quite slow, taking 6 seconds to encode 1 second of video. I believe this is caused by the background image. Any tips on speeding up encoding? Here's the ffmpeg log:

MacBook-Pro:Video archagon$ ffmpeg -loop 1 -i underlay.png -i test-slide-video-short.flv -i test-speaker-video-short.flv -filter_complex "[1:0]scale=400:-1[a];[2:0]scale=320:-1[b];[0:0][a]overlay=0:0[c];[c][b]overlay=0:0" -shortest -t 5 -an output.mp4
ffmpeg version 1.0 Copyright (c) 2000-2012 the FFmpeg developers
  built on Nov 14 2012 16:18:58 with Apple clang version 4.0 (tags/Apple/clang-421.0.60) (based on LLVM 3.1svn)
  configuration: --prefix=/opt/local --enable-swscale --enable-avfilter --enable-libmp3lame --enable-libvorbis --enable-libopus --enable-libtheora --enable-libschroedinger --enable-libopenjpeg --enable-libmodplug --enable-libvpx --enable-libspeex --mandir=/opt/local/share/man --enable-shared --enable-pthreads --cc=/usr/bin/clang --arch=x86_64 --enable-yasm --enable-gpl --enable-postproc --enable-libx264 --enable-libxvid
  libavutil      51. 73.101 / 51. 73.101
  libavcodec     54. 59.100 / 54. 59.100
  libavformat    54. 29.104 / 54. 29.104
  libavdevice    54.  2.101 / 54.  2.101
  libavfilter     3. 17.100 /  3. 17.100
  libswscale      2.  1.101 /  2.  1.101
  libswresample   0. 15.100 /  0. 15.100
  libpostproc    52.  0.100 / 52.  0.100
Input #0, image2, from 'underlay.png':
  Duration: 00:00:00.04, start: 0.000000, bitrate: N/A
    Stream #0:0: Video: png, rgb24, 1024x768, 25 fps, 25 tbr, 25 tbn, 25 tbc
Input #1, flv, from 'test-slide-video-short.flv':
  Metadata:
    author          : 
    copyright       : 
    description     : 
    keywords        : 
    rating          : 
    title           : 
    presetname      : Custom
    videodevice     : VGA2USB Pro V3U30343
    videokeyframe_frequency: 5
    canSeekToEnd    : false
    createdby       : FMS 3.5
    creationdate    : Mon Aug 16 16:35:34 2010
    encoder         : Lavf54.29.104
  Duration: 00:50:32.75, start: 0.000000, bitrate: 90 kb/s
    Stream #1:0: Video: vp6f, yuv420p, 640x480, 153 kb/s, 8 tbr, 1k tbn, 1k tbc
Input #2, flv, from 'test-speaker-video-short.flv':
  Metadata:
    author          : 
    copyright       : 
    description     : 
    keywords        : 
    rating          : 
    title           : 
    presetname      : Custom
    videodevice     : Microsoft DV Camera and VCR
    videokeyframe_frequency: 5
    audiodevice     : Microsoft DV Camera and VCR
    audiochannels   : 1
    audioinputvolume: 75
    canSeekToEnd    : false
    createdby       : FMS 3.5
    creationdate    : Mon Aug 16 16:35:34 2010
    encoder         : Lavf54.29.104
  Duration: 00:50:38.05, start: 0.000000, bitrate: 238 kb/s
    Stream #2:0: Video: vp6f, yuv420p, 320x240, 204 kb/s, 25 tbr, 1k tbn, 1k tbc
    Stream #2:1: Audio: mp3, 22050 Hz, mono, s16, 32 kb/s
File 'output.mp4' already exists. Overwrite ? [y/N] y
using cpu capabilities: none!
[libx264 @ 0x7fa84c02f200] profile High, level 3.1
[libx264 @ 0x7fa84c02f200] 264 - core 119 - H.264/MPEG-4 AVC codec - Copyleft 2003-2011 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=3 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'output.mp4':
  Metadata:
    encoder         : Lavf54.29.104
    Stream #0:0: Video: h264 ([33][0][0][0] / 0x0021), yuv420p, 1024x768, q=-1--1, 25 tbn, 25 tbc
Stream mapping:
  Stream #0:0 (png) -> overlay:main
  Stream #1:0 (vp6f) -> scale
  Stream #2:0 (vp6f) -> scale
  overlay -> Stream #0:0 (libx264)
Press [q] to stop, [?] for help

Update 2:

It works! One important tweak was to move the underlay.png input to the end of the input list. This increased performance substantially. Here's my final ffmpeg call. (The maps at the end aren't required for this particular arrangement, but I sometimes have a few extra audio inputs that I want to map to my output.)

ffmpeg
    -i VideoOne.flv
    -i VideoTwo.flv
    -loop 1 -i Underlay.png
    -filter_complex "[2:0] [0:0] overlay=20:main_h/2-overlay_h/2 [overlay];[overlay] [1:0] overlay=main_w-overlay_w-20:main_h/2-overlay_h/2 [output]"
    -map [output]:v
    -map 0:a
    OutputVideo.m4v

llogan · Accepted Answer · 2013-11-18T21:54:08.827

Complex filtergraphs in ffmpeg may seem complicated at first, but it makes sense once you try it a few times. You need to be familiar with the filtergraph syntax. Start by reading Filtering Introduction and Filtergraph Description. You do not have to understand it completely but it will help you understand the following example.

Example

two videos over static image

Use the scale video filter to scale (resize) the inputs to a specific size, and then use the overlay video filter to place the videos over the static images.

ffmpeg -loop 1 -i background.png -i video1.mp4 -i video2.mp4 -filter_complex \
"[1:v]scale=(iw/2)-20:-1[a]; \
 [2:v]scale=(iw/2)-20:-1[b]; \
 [0:v][a]overlay=10:(main_h/2)-(overlay_h/2):shortest=1[c]; \
 [c][b]overlay=main_w-overlay_w-10:(main_h/2)-(overlay_h/2)[video]" \
-map "[video]" output.mkv

What this means

Non-filtering options:

-loop 1 Continuously loop the next input which is background.png.
background.png The background image. The stream specifier is [0:v] It is sized 1280x720.
video1.mp4 This first video input (Big Buck Bunny in the example image). The stream specifier is [1:v]. It is sized 640x360.
video2.mp4 This second video input (the varmints in the example image). The stream specifier is [2:v]. It is sized 640x360.

Filtering options

-filter_complex The option to start the complex filtergraph.
[1:v]scale=(iw/2)-20:-1[a] This is taking video1.mp4, referred to as [1:v], and scaling it. iw is an alias for input width, and in this case it is a value of 640. We divide than in half and subtract an additional 20 pixels as padding so there will be space around each video when it is overlaid. -1 means to automatically calculate a value that will preserve aspect. If course you can omit the fanciness and manually provide values such as scale=320:240. Then use an output link label named [a] so we can refer to this output later.
[2:v]scale=(iw/2)-20:-1[b] Same as above, but use video2.mp4 as the input and name the output link label as [b].
[0:v][a]overlay=10:(main_h/2)-(overlay_h/2):shortest=1[c] Use background.png as first overlay input, and use the results of our first scale filter, referred to as [a], as the second overlay input. Place [a] over [0:v]. main_h is an alias for main height which refers to the background input ([0:v]) height. overlay_h is an alias for overlay height and refers to the height of the foreground ([a]). This example will place Big Buck Bunny on the left side. shortest=1 will force the output to terminate when the shortest input terminates; otherwise it will loop forever since background.png is looping. Name the results of this filter [c].
[c][b]overlay=overlay_w*2:overlay_h:shortest=1[video] Use [c] as the first overlay input and [b] as the second overlay input. Using overlay parameters overlay_w and overlay_h (overlay input width and height). This example will place the verminy varmints on the right side. Label the output as [video].
-map "[video]" map the output from the filter to the output file. The [video] link label at the end of the filtergraph is not necessarily required but it is recommended to be explicit with mapping.

Audio

Have two separate audio streams

By default only the first input audio channel encountered will be used in the output as defined in Stream Selection. You can use the -map option to add an additional audio track from the second video input (the output will have two audio streams). This example will stream copy the audio instead of re-encoding:

ffmpeg -loop 1 -i background.png -i video1.mp4 -i video2.mp4 -filter_complex \
"[1:v]scale=(iw/2)-20:-1[a]; \
 [2:v]scale=(iw/2)-20:-1[b]; \
 [0:v][a]overlay=10:(main_h/2)-(overlay_h/2):shortest=1[c]; \
 [c][b]overlay=main_w-overlay_w-10:(main_h/2)-(overlay_h/2)[video]" \
-map "[video]" -map 1:a -map 2:a -codec:a copy output.mkv

Combine both audio streams

Or combine both audio inputs into one using the amerge and pan audio filters (assuming both inputs are stereo and you want stereo output):

ffmpeg -loop 1 -i background.png -i video1.mp4 -i video2.mp4 -filter_complex \
"[1:v]scale=(iw/2)-20:-1[a]; \
 [2:v]scale=(iw/2)-20:-1[b]; \
 [0:v][a]overlay=10:(main_h/2)-(overlay_h/2):shortest=1[c]; \
 [c][b]overlay=main_w-overlay_w-10:(main_h/2)-(overlay_h/2)[video]" \
 [1:a][2:a]amerge,pan=stereo:c0<c0+c2:c1<c1+c3[audio]" \
-map "[video]" -map "[audio]" output.mkv

Also see

Thank you so much for the explanation! I tried fiddling with filter_complex before posting my question, but it's a hairy beast. As I'm testing this, one issue that I'm noticing is that each second of video takes about 6 seconds to encode. I believe this is caused by my background image. Do you know if there's any way to speed this up? — Archagon, Nov 15 '12 at 22:03
@Archagon Not without seeing your ffmpeg command and the complete console output. You can update your question with the additional information. — llogan, Nov 15 '12 at 22:04
LordNeckbeard, I added the log. I really appreciate the help. As an aside, would you happen to know why ffmpeg behaves differently in regards to overlay framerate with filter_complex as opposed to vf? One of my videos is 8fps and the other is 25fps. When I tried using vf with the unaltered video files, I ran into the problem described [here](http://stackoverflow.com/questions/5890738/overlaying-video-with-ffmpeg). But with filter_complex, everything seems to work by default! — Archagon, Nov 15 '12 at 23:08
Actually, scratch that last question. When I was using vf, I overlayed my 25fps video on top of my 8fps video sans background image, which made the final video 8fps. And from the ffmpeg log, it looks like the image is rendered at 25fps by default, which is why my filter_complex call returns a 25fps video. — Archagon, Nov 15 '12 at 23:33
@Archagon You're encoding with `libx264`, so you can use a faster preset than the default `medium`. See the CRF section of the [FFmpeg and x264 Encoding Guide](https://ffmpeg.org/trac/ffmpeg/wiki/x264EncodingGuide) for an example and more information. — llogan, Nov 16 '12 at 01:35
LordNeckbeard, if you read this, can you help me with one more thing? I'm trying to map the audio from two of my inputs into two audio channels in my output video. I see there's a "copy" filter, but I can't seem to get it to work, and besides I'd prefer to do as little in filter_complex as possible. What command can I use to map the audio from multiple inputs to a single output? The ffmpeg documentation is really kicking my butt! — Archagon, Mar 23 '13 at 11:06
@Archagon Here's an example to merge two stereo inputs into one stereo output: `ffmpeg -i input1 -i input2 -filter_complex "amerge,pan=stereo:c0 — llogan, Mar 23 '13 at 20:04
Thank you, the best explanation I've ever read about ffmpeg filters! — Antonio Petricca, Aug 21 '14 at 05:48
You have no idea how many searches I did, how much documentation I read, or how many things I tried before finding this answer which finally helped me understand it and solve my problem. Thank you! — David Conrad, Feb 17 '19 at 16:56
Thanks, is there a way to add dynamic image overlay on live stream video input? Means we can keep changing image overlay after video stream started? — Ankit Maheshwari, Oct 30 '20 at 17:00
Dynamic means I am able to add image ovelay which can be changed in every x seconds, so that I can add additional information to video (Just like video caption runs) the time when video is streaming from phone to any RTMP. — Ankit Maheshwari, Nov 02 '20 at 15:43
@AnkitMaheshwari See [Change image overlay on demand](https://stackoverflow.com/a/49467812/) and [Live stream with changeable overlay](https://video.stackexchange.com/a/29458). — llogan, Nov 02 '20 at 19:46