Ffmpeg - How can I create HLS multiple language streams, in multiple qualities?

Question

Preface

I'm working on converting videos from 4k to multiple qualities with multiple languages but am having issues with the multiple languages overlaying, sometimes losing quality and sometimes being out of sync. (this is less of a problem in the German audio, as this is voice over anyhow)

We as a team are complete noobs in terms of Video / Audio + HLS -- I'm a front end developer who has no experience of this so apologies if my question is poorly phrased

Videos

I have the video in a 4k format and have removed the original sound as I have English and German audio files that need to be overlayed. I am then taking these files and throwing them together into a .ts file like this:

$ ffmpeg -i ep03-ns-4k.mp4 -i nkit-ep3-de-output.m4a -i nkit-ep3-en-output.m4a \
> -thread 0 -muxdelay 0 -y \
> -map 0:v -map 1 -map 2  -movflags +faststart -refs 1 \
> -vcodec libx264 -acodec aac -profile:v baseline -level 30 -ar 44100 -ab 64k -f mpegts out.ts

This outputs a 4k out.ts video, with both audio tracks playing.

The hard part

This is where I'm finding it tricky, I now need to convert this single file into multiple quality levels (480, 720, 1080, 1920) and I attempt this with the following command:

ffmpeg -hide_banner -y -i out.ts \
-crf 20 -sc_threshold 0 -g 48 -keyint_min 48 -ar 48000 \
-map 0:v:0 -map 0:v:0 -map 0:v:0 -map 0:v:0 \
-c:v:0 h264 -profile:v:0 main -filter:v:0 "scale=w=848:h=480:force_original_aspect_ratio=decrease" -b:v:0 1400k -maxrate:v:0 1498k -bufsize:v:0 2100k \
-c:v:1 h264 -profile:v:1 main -filter:v:1 "scale=w=1280:h=720:force_original_aspect_ratio=decrease" -b:v:1 2800k -maxrate:v:1 2996k -bufsize:v:1 4200k \
-c:v:2 h264 -profile:v:2 main -filter:v:2 "scale=w=1920:h=1080:force_original_aspect_ratio=decrease" -b:v:2 5600k -maxrate:v:2 5992k -bufsize:v:2 8400k \
-c:v:3 h264 -profile:v:3 main -filter:v:3 "scale=w=3840:h=1920:force_original_aspect_ratio=decrease" -b:v:3 11200k -maxrate:v:3 11984k -bufsize:v:3 16800k \
-var_stream_map "v:0 v:1 v:2 v:3" \
-master_pl_name master.m3u8 \
-f hls -hls_time 4 -hls_playlist_type vod -hls_list_size 0 \
-hls_segment_filename "%v/episode-%03d.ts" "%v/episode.m3u8"

This creates the required qualities, but I'm now at a loss of how this might work with the audio

Audio

For the audio I run this command:

ffmpeg -i out.ts -threads 0 -muxdelay 0 -y -map 0:a:0 -codec copy -f segment -segment_time 4 -segment_list_size 0 -segment_list audio-de/audio-de.m3u8 -segment_format mpegts audio-de/audio-de_%d.aac
ffmpeg -i out.ts -threads 0 -muxdelay 0 -y -map 0:a:1 -codec copy -f segment -segment_time 4 -segment_list_size 0 -segment_list audio-en/audio-en.m3u8 -segment_format mpegts audio-en/audio-en_%d.aac

This creates the required audio segments.

The question

I realise this is quite an ask, but is there anything wrong with our inputs? Is there a way that this can be done a bit more streamlined?

Any answers are greatly appreciated.

You have video "A" and audio files "B", "C", "D" etc. You want to make videos like A+B, A+C, A+D etc. and each of these at multiple resolutions like 1080p, 720p, 480p etc. Is my understanding correct? — Rajib, Oct 19 '21 at 09:09
@Rajib This is correct. We are attempting to add the audio to the videos like in your example. — Daniel Ellis, Oct 19 '21 at 09:54

score 1 · Answer 1 · answered Oct 19 '21 at 11:02

Lets say you have:

VideoA

AudioB-> Language 1

AudioC-> Language 2

AudioD-> Language 3

Although it can be done all together, it is better to use different commands for each language instance.

Note that the following are schematics only- some values and parameters will need to be filled in by you. However, this provides a scheme of how to connect the entities. Also I have simply set the size, and NOT used a scale filter. You can use a scale filter instead. Filters will go in place of the size parameter (-s 1280x720 etc).

ffmpeg -i VideoA -i AudioB -map [0:v] -map [1:a] -s 1280x720 -acodec aac -b:a 128k \
-vcodec libx264 -pix_fmt yuv420p [your other parameters go here] -movflags +faststart \
OutputAB_720p.mp4 -map [0:v] -map [1:a] -s 1920x1080 -acodec aac -b:a 128k -vcodec \
libx264 -pix_fmt yuv420p  [your other parameters go here] -movflags +faststart \
OutputAB_1080p.mp4

The above shows a scheme for 2 resolutions, 720p and 1080p, merging VideoA with AudioB. To do the same scheme for AudioC you would repeat:

ffmpeg -i VideoA -i AudioC -map [0:v] -map [1:a] -s 1280x720 -acodec aac -b:a 128k \
-vcodec libx264 -pix_fmt yuv420p [your other parameters go here] -movflags +faststart \
OutputAC_720p.mp4 -map [0:v] -map [1:a] -s 1920x1080 -acodec aac -b:a 128k -vcodec \
libx264 -pix_fmt yuv420p  [your other parameters go here] -movflags +faststart \
OutputAC_1080p.mp4

You could put all the inputs together:

ffmpeg -i VideoA -i AudioB -i AudioC -i AudioD

and accordingly map each for every language:

-map [0:v] -map [1:a]
-map [0:v] -map [2:a]
-map [0:v] -map [3:a]
etc.

But I feel such long commands that will result make it difficult to read, maintain and correct.

Hey @Rajib, thanks for your help. I was more thinking about splitting the files into HLS bytes to make streaming them from our server easier. The issue we're facing is the audio is out of sync with the video. Would this still apply for a HLS .m3u8 manifest if its just one video bonded with the audio? I thought we had to split the audio channels seperately and video after in different resolutions — Daniel Ellis, Oct 22 '21 at 09:50