Why are scipy and librosa different for reading wav file?

Question

So I'm trying to get the samples from a wave file and I noticed that it's a different value depending on whether I use scipy or librosa.

sampleFloats, fs = librosa.load('hi.wav', sr=48000)
print('{0:.15f}'.format(sampleFloats[len(sampleFloats)-1]))

from scipy.io.wavfile import read as wavread
# from python_speech_features import mfcc

[samplerate, x] = wavread('hi.wav') # x is a numpy array of integer, representing the samples 

# scale to -1.0 -- 1.0
if x.dtype == 'int16':
    nb_bits = 16 # -> 16-bit wav files
elif x.dtype == 'int32':
    nb_bits = 32 # -> 32-bit wav files
max_nb_bit = float(2 ** (nb_bits - 1))
samples = x / (max_nb_bit + 1.0) # samples is a numpy array of float representing the samples 

print(samples[len(samples)-1])

The print statements read:

0.001251220703125
0.001274064182641886

The sample rate for the file is 48000.

Why might they be different? Is librosa using a different normalization?

FYI (Python tip): `sampleFloats[len(sampleFloats)-1]` can be simplified to `sampleFloats[-1]` — Warren Weckesser, Jul 24 '18 at 16:22
Why do you divide by `max_nb_bit + 1.0`? I suspect that should be `max_nb_bit - 1`. — Warren Weckesser, Jul 24 '18 at 16:25
What happens if you use `sr=None` in the call to `librosa.load()`? — Warren Weckesser, Jul 24 '18 at 16:28
Thanks for the python tip Warren. The normalization code I found elsewhere, but the values match up exactly right with samples when I grab them with iOS audio framework code, so I’m fairly confident that part is working well. If you put None into the load() it uses a default sampling of 44100 which turns out to be incorrect as well. That’s expected since the sampling is at 48000 — MScottWaller, Jul 24 '18 at 20:38
My understanding of librosa is that when you specify the sample rate, the code resamples the data in the file. If you use `sr=None` and get a sample rate of 44100 with librosa, then that is the same sample rate that the scipy reader will see (check: what is `samplerate` in the python code?). When you give librosa the argument `sr=48000`, it will resample the signal from 44100 to 48000 and return the resampled signal. So librosa is doing more preprocessing than the scipy code, and the signals that you are comparing (librosa output and scipy output) are not sampled at the same rate. — Warren Weckesser, Jul 24 '18 at 23:51
Good call. So you're right that sr=None returns the sample rate 48000, so I didn't need to specify it. However, even when using sr=None, the values remain different, even though the sample rate is the same for the two of them. — MScottWaller, Jul 25 '18 at 01:57

score 1 · Accepted Answer · answered Jun 06 '19 at 10:54

It's a type mismatch. It is often useful to print not only the value, but also its type. In this case, because of the way the normalisation is done, type of samples values is float64, while librosa returns float32.

This answer can help to figure out how to normalise (also, as pointed above, it is indeed max_nb_bit - 1, not +)

Why are scipy and librosa different for reading wav file?

1 Answers1