1

Currently, I am trying to load 280,000 MP3 audio files in Python where the average duration of files is ~5 seconds. I am using Librosa for this purpose as well as for the further processing (e.g. computing spectrogram) in later stages.

However, I realized that loading the files is very slow, as on average it takes 370 milliseconds for each file to be loaded, uncompressed and re-sampled. If I turn off the re-sampling (i.e. librosa.load(..., sr=None)), it takes around 200 milliseconds but that's not still good considering the large number of files I have. Unsurprisingly, loading wav files without re-sampling is very fast (< 1 ms); but if we perform the re-sampling, it takes around 160 milliseconds.

Now I was wondering if there is any faster approach for doing this, whether directly in Python or using external tools in Linux with the condition that I can later load the results back to Python.

By the way, I have tried using multiprocessing with a pool of size 4 and achieved 2-3x speed-up, but I am looking for more (preferably > 10x).

Note: the original files are human voice and have a sample rate of 48KHz and a bit-rate of 64 Kbps; I want to downsample them to 16KHz.

today
  • 32,602
  • 8
  • 95
  • 115
  • 1
    You could try [pysox](https://github.com/rabitt/pysox). – Hendrik Jul 23 '19 at 06:06
  • @hendrik Thanks a lot! I tried `pysox` by downsampling and converting mp3 files to wav and on average it took 20 milliseconds for each file. Much better, even better than `ffmpeg` which I also tried and it took 100 milliseconds for the same operation. – today Jul 23 '19 at 10:03
  • Cool. I'll make it a real answer. – Hendrik Jul 23 '19 at 10:08
  • Maybe a silly question, but I can't see how to simply read the mp3 file without doing any conversion? – moinudin Apr 12 '21 at 18:43
  • If by "conversion" you are only referring to resampling step, then it's possible to not perform resampling; however, for MP3 files (unlike wav files), I guess they should be at least uncompressed/decoded first to get the raw samples, so that step would be required at least (though, I am not an expert on that topic). – today Apr 12 '21 at 18:56

1 Answers1

7

You could use pysox.

It's a thin Python wrapper around SoX, "the Swiss Army knife of sound processing programs."

Note: For faster processing (avoiding exec calls), you may also install and use soxbindings. All you need to do is to replace

import sox

with

import soxbindings as sox
Hendrik
  • 5,085
  • 24
  • 56