I am using the following ffmpeg "amerge" command to mix two audio files,
ffmpeg -i voice.mp3 -i music.mp3 -filter_complex "[0:a]volume=1dB[a0];[1:a]volume=0.5[a1];[a0][a1]amerge=inputs=2[a]" -map "[a]" -strict -2 -y output.mp3
voice.mp3 file also includes the silences in the middle of the audio, the positions of silence is completely dynamic.
Currently, the voice volume is set as 1db and the music volume is set as 0.5. Because of this when there is no voice, the audio volume sounds low, if I increase the background music volume, it will spoil the voice clarity.
Is there a way where the volume for the voice and music gets adjusted dynamically while mixing using "ffmpeg" or any such tool?
I know that it is possible by writing the code to separate silence and voice and mix individually with the music and then merge everything together, in that method getting the music flow without any jerks is difficult, also it requires a lot of coding and testing.