3

In passing a file via the use:

librosa_audio, librosa_sample_rate = librosa.load(filename)

The output produces an audio file such that:

Librosa audio file min~max range: -1.2105224 to 1.2942806

The file that I am working on was obtained from https://www.boomlibrary.com/ and had a bit depth of 24. I down sampled to 16 and also up sampled to 32 to work with librosa. Both of these files produced the same min-max range after going through librosa.

Why does this happen?
Is there a way to parse the wav file to Librosa such that the data will fall between [-1,1]?

Here is a link to the files:

https://drive.google.com/drive/folders/12a0ii5i0ugyvdMMRX4MPfWMSN0arD0bn?usp=sharing

Lukasz Tracewski
  • 10,794
  • 3
  • 34
  • 53
Joe
  • 357
  • 2
  • 10
  • 32
  • 1
    Can you share the exact file that gave you these results? What is your OS? – Lukasz Tracewski Dec 05 '20 at 19:35
  • @LukaszTracewski. Thanks for your reply. I am using Windows 10. I have also added a link to the files that I am having issues with. Please let me know if you are able to access them. Thanks! – Joe Dec 05 '20 at 21:27

1 Answers1

3

The behaviour you are observing stems directly from resampling to 22050 Hz that librosa load does by default:

librosa.core.load(path, sr=22050)

Resampling process always affects the audio, hence you see values that are not normalized. You have to do this yourself.

More likely, you wanted to read the audio with the native sampling rate, in which case you should have passed None to sr like this:

librosa.core.load(path, sr=None)

Example based on the audio sample you have provided:

In [4]: y, sr = librosa.load('201-AWCKARAK47Close0116BIT.wav', sr=None)
In [5]: y.max()
Out[5]: 0.9773865

In [6]: y.min()
Out[6]: -0.8358917
Lukasz Tracewski
  • 10,794
  • 3
  • 34
  • 53
  • Hello, Thanks for your response. Sincerely appreciated! For further discussion, data from the 8k Urbansound dataset found: https://urbansounddataset.weebly.com/, have no issues with the rate conversion and normalization. It seems that the rate of new files I am adding (sr=96000) is causing problems. Is the explanation due to the fact that this sr is too large and leads to the inconsistent normalization? Thanks! – Joe Dec 06 '20 at 14:04
  • 1
    The normalization is done before resampling when you `load`, hence the effect. Resampling changes underlying data by definition, so you should not expect any normalization to stay. It does not matter if it's 22 or 96 kHz. I'd recommend to normalize to e.g. -1 or -3 dB after you resample. You might consider using e.g. `sox` of `ffmpeg` to batch convert in parallel all the data. These tools have excellent performance and deliver best results. – Lukasz Tracewski Dec 06 '20 at 18:16