16

I am concatenating multiple (max 25) audio files using SoX with

sox first.mp3 second.mp3 third.mp3 result.mp3

which does what it is supposed to; concatenates given files into one file. But unfortunately there is a small time-gap between those files in result.mp3. Is there a way to remove this gap?

I am creating first.mp3, second.mp3 and so on before concatenating them by merging multiple audios(same length/format/rate):

sox -m drums.mp3 bass.mp3 guitar.mp3 first.mp3

How can I check and assure that there is no time-gap added on all those files? (merged and concatenated)

I need to achieve a seamless playback of all the concatenated files (when playing them in browser one after another it works ok).

Thank you for any help.

EDIT:

The exact example (without real file-names) of a command I am running is now:

sox "|sox -m file1.mp3 file2.mp3 file3.mp3 file4.mp3 -p" "|sox -m file1.mp3 file6.mp3 file7.mp3 -p" "|sox -m file5.mp3 file6.mp3 file4.mp3 -p" "|sox -m file0.mp3 file2.mp3 file9.mp3 -p" "|sox -m file1.mp3 file15.mp3 file4.mp3 -p" result.mp3

This merges files and pipes them directly into concatenation command. The resulting mp3 (result.mp3) has an ever so slight delay between concatenated files. Any ideas really appreciated.

trainoasis
  • 6,419
  • 12
  • 51
  • 82
  • 2
    mp3 is a lossy format, you should not use it anywhere except probably the very last encoding step, because each conversion to mp3 damages the audio. – Display Name Jun 18 '15 at 10:34
  • do you fix this problem? I have same issue when concat multi audios/videos files into one video with FFMPEG, there is about a second of silence at every timestamp where two audio/clips were concatenated – VnDevil Nov 14 '22 at 10:00
  • This was long time ago, but yeah, I did. See the answer below and the chat we discussed in (comment section). – trainoasis Nov 14 '22 at 17:38

3 Answers3

11

The best — though least helpful — way to do this is not to use MP3 files as your source files. WAV, FLAC or M4A files don't have this problem.

MP3s aren't made up of fixed-rate samples, so cropping out a section of an arbitrary length will not work as you expect. Unless the encoder was smart (like lame), there will often be a gap at the start or end of the MP3 file's audio. I did a test with a sample 0.98s long (which is precisely 73½ CDDA frames, and many MP3 encoders use frames for minimum sample lengths). I then encoded the sample with three different MP3 encoders (lame, sox, and the ancient shine), then decoded those files with three decoders (lame, sox, and madplay). Here's how the sample lengths compare to the original:

 Enc.→Dec.          Length     Samples  CDDA Frames
 -----------------  ---------  -------  -----------
 shine→lame         0.95"      42095    71.5901
 shine→madplay      0.97"      42624    72.4898
 shine→sox          0.97"      42624    72.4898
 lame→lame          0.98"      43218    73.5000
*Original           0.98"      43218    73.5000
 sox→sox            0.99"      43776    74.4490
 sox→lame           1.01"      44399    75.5085
 lame→madplay       1.02"      44928    76.4082
 lame→sox           1.02"      44928    76.4082
 sox→madplay        1.02"      44928    76.4082

Only the file encoded and decoded by lame ended up the same length (mostly because lame inserts a length tag to correct for these too-short samples, and knows how to decode it). Everything encoded by sox ended up with a tiny gap, no matter what decoder I used. So joining the files will result in tiny clicks.

Your browser is likely mixing and overlapping the source files very slightly so you don't hear the clicks. Gapless playback is hard to do correctly.

scruss
  • 1,030
  • 10
  • 24
  • *VBR (Variable bitrate) MP3s aren't made up of fixed-rate samples. CBR MP3s are. The assertion is functionally correct though as many MP3s are VBR (a default setting in most encoders, and it saves space), and it takes extra effort to determine whether MP3s are CBR before concatenating. Better to avoid them altogether, even if it's just transcoding to WAV first. – joshfindit Aug 08 '19 at 14:37
  • 1
    CBR mp3s may be fixed-rate samples, but unless you're careful to crop exactly on audio frame boundaries, you'll still get dropouts. Worse still, if the cropped mp3s don't have a length tag, the decoder's got no way of telling if the extra data at the end of the file is legitimate silence or frame padding. You may transcode to WAVs complete with gaps/clicks in them. – scruss Feb 01 '21 at 15:08
6

This is my guess for your issue:

  • sox does not add time gap during concatenation,
  • however it add time-gap in other operations, for instance if you do a conversion before the concatenation.

To find out what happens I suggest you to check all durations of your files at each time (you can use soxi for instance) to see what's going on.

If it doesn't work (the time-gap is added during concatenation), let me please do another guess:

  • Sox add time gap because your samples at the beginning or at the end of the file are not close to zero.

To solve this, you could use very short fade-in an fade-out on you files.

Moreover, to force sox to output files with a well-defined length, you could use the trim parameter like this:

sox filein.mp3 trim 0 duration fileout.mp3
PatriceG
  • 3,851
  • 5
  • 28
  • 43
  • Thanks for your answer.Please see my edited question with example of my command - if it helps at all. (because you mentioned 'other operations' and I am using piping and merging as well. No conversion though.) – trainoasis Dec 04 '14 at 14:31
  • Could you check the duration of each result separately to see when sox add a time-gap ? – PatriceG Dec 04 '14 at 14:41
  • I used 'sox result.mp3 -n stat 2>&1' to see what one of the merged files comes to, and it seems the length is the same :/ – trainoasis Dec 04 '14 at 14:54
  • When I check the time of the resulting file, and if I divide it by the number of merges made it comes down to 4.0489797s instead of 4.048980s, which is what I get for a single file. Do you think this could actually make such a difference? – trainoasis Dec 04 '14 at 15:00
  • No I don't. I think this difference is not audible. But I suppose that the added time-gap are audible. Can you heard them or see them on software like audacity ? – PatriceG Dec 04 '14 at 15:10
  • Does all your files have the same length ? – PatriceG Dec 04 '14 at 15:12
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/66214/discussion-between-trainoasis-and-sandoval31). – trainoasis Dec 04 '14 at 15:20
1

First you need really check if the start and the end of your files has no silences, i dont know if sox can do it but you need check the energy(rms, dB) of the start and end audio signals and cut start and end silence, to join audio files without gaps you need apply one window function in your signal to works like a fadein/fadeout and then crossfade the beginning of one with the end of the other.

sox provide a splice function to crossfade:

splice [−h|−t|−q] { position[,excess[,leeway]] }
Splice together audio sections. This effect provides two things over simple audio concatenation: a (usually short) cross-fade is applied at the join, and a wave similarity comparison is made to help determine the best place at which to make the join.

Check Documentation here

ederwander
  • 3,410
  • 1
  • 18
  • 23
  • Thank you for your answer. I'm not exactly sure how to use splice in my case; it's rather a bit complex. Could you provide an example for my case? – trainoasis Aug 14 '14 at 08:00
  • you need test what is the best for you, try `sox first.mp3 second.mp3 third.mp3 result.mp3 splice -q 4,1` or `sox first.mp3 second.mp3 third.mp3 result.mp3 splice -h 2,1` or `sox first.mp3 second.mp3 third.mp3 result.mp3 splice -t 2,2`, find the best options for you ... – ederwander Aug 14 '14 at 12:19
  • None of these (tried my own also ofcourse) didn't work for me. Thanks for your effort anyway – trainoasis Aug 14 '14 at 13:10