2

I am loading the wave file in both method wave.readframes() and librosa.load()

import librosa
import wave
sample_wave = './data/mywave.wav'

#open file and stft by librosa
a, sr = librosa.load(sample_wave,sr=44100)

print(len(a))
print(a)

#open file and by wave
wav=wave.open(sample_wave)
data=wav.readframes(wav.getnframes())
b = np.frombuffer(data,dtype=np.int16)

print(len(b))
print(b)

It shows the result like this

490255 #(length of by the librosa data)
[-3.0517578e-05  3.9672852e-04 -3.0517578e-05 ...  3.0517578e-05
  3.0517578e-05  0.0000000e+00] #(the data by librosa)
490255 #(length of by the wave data)
[-1 13 -1 ...  1  1  0] #(the data by wave)

OK length of both are the same as 490255.

However data is completely not the same ( I guess data by wave is almost one-third of data by librosa??)

Why this difference happens???

whitebear
  • 11,200
  • 24
  • 114
  • 237

2 Answers2

1

The reason is that librosa scales the data with this buff_to_float() function.

To make your wave frames have the same values as in librosa, you can scale the data by yourself:

import wave

# Load audio
wav = wave.open(sample_wave, 'rb')
frames = wav.readframes(wav.getnframes())
data = np.frombuffer(frames, dtype=np.int16).astype(np.float32)

# Scale the audio
scale = 1./float(1 << ((8 * wav.getsampwidth()) - 1)) # from librosa
data *= scale
Hai Tran
  • 45
  • 6
0

By default, librosa.load converts the samples to floating point. wave.readframes leaves the data in the raw format found in the file, and your code interprets the values as 16 bit integers, which is apparently the correct format for that file.

Try changing the call of librosa.load to

a, sr = librosa.load(sample_wave,sr=44100, dtype=np.int16)
Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214