1

I am new to audioprocessing, and need some help for my project. Could someone explain me the difference between the kind of data that is returned by librosa.load, and scipy.io.wavefile.read? The former gives an array of floats while the latter gives an integer array. And the amusing part is that the size of the array returned in both cases are different.

Please provide some insight to this. (You may use your own audiofile to reproduce the problem)

sig, sr = librosa.core.load(filepath, sr=None)
sig[:10]
array([ 0.00262944,  0.00108277, -0.00248273, -0.00865669, -0.0161767 ,
   -0.01958228, -0.01867038, -0.01742653, -0.01652605, -0.01589082],
  dtype=float32)

sr, y = scipy.io.wavfile.read(filepath)
y[:10]
array([  94,  -10, -217, -564, -627, -582, -527, -520, -440, -349],
  dtype=int16)

print(sig.shape)
(7711,)

y.shape
(5595,)
Satashree Roy
  • 365
  • 2
  • 9

1 Answers1

0

Take another look at the docstring for librosa.core.load. It says right there in the first three sentences:

Load an audio file as a floating point time series.

Audio will be automatically resampled to the given rate (default sr=22050).

To preserve the native sampling rate of the file, use sr=None.

So librosa is converting the data to floating point, and (by default) resampling the data to 22050 samples per second. You used sr=None, so I don't know why the lengths of the arrays are coming out different.

Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214
  • Thank you for your response, but I did use sr=None. So how are those particular float values coming in the scene? Are they normalized? – Satashree Roy Jun 01 '19 at 15:26
  • I added a note about your use of `sr=None`, just to say "I don't know why...". You should probably unaccept this answer (I don't mind!) if you want to attract more complete answers. – Warren Weckesser Jun 01 '19 at 16:23