1

I am having some odd vertical scaling issues with librosa.feature.melspectrogram(). It seems that when I use librosa.load() with sr=None, the Hz scale doesn't coincide with the intended spectrographic features. To investigate this further, I looked at a pure 1,000Hz tone which I got from https://www.mediacollege.com/audio/tone/download/

import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np

filename = '1kHz_44100Hz_16bit_05sec.wav'
y1, sr1 = librosa.load(filename,sr=None)
y2, sr2 = librosa.load(filename)

fig, ax = plt.subplots(1,2)

S = librosa.feature.melspectrogram(y1, sr=sr1, n_mels=128)
S_DB = librosa.power_to_db(S, ref=np.max)
librosa.display.specshow(S_DB, sr=sr1, y_axis='mel', ax=ax[0]);
ax[0].title.set_text(f"sr1={sr1}\nload(filename,sr=None)")

S = librosa.feature.melspectrogram(y2, sr=sr2, n_mels=128)
S_DB = librosa.power_to_db(S, ref=np.max)
librosa.display.specshow(S_DB, sr=sr2, y_axis='mel', ax=ax[1]);
ax[1].title.set_text(f"sr2={sr2}\nload(filename)")

plt.tight_layout()

enter image description here

I'm not sure why the 1kHz tone is not lining up in both spectrograms. I would suspect the one with sr=None to be the more accurate as it is using the actual samplerate from the file. Would anyone know why there is a difference? The feature in the left plot is obviously not at 1kHz, but more like 800Hz or so. Thanks.

zenith7
  • 151
  • 1
  • 3
  • 8
  • Have you checked the values of sr1 and sr2? – Jon Nordby Dec 20 '20 at 20:32
  • I have updated the code to indicate sr1 and sr2 in the plot. sr1=44100Hz is the wav file's actual sr. The left plot is using this sr, so it should have the source frequency at 1kHz, which it doesn't. The Nyquist frequency of both instances (nqf1=22,050Hz and nqf2=11,025Hz) are well in excess of 1kHz. So the sr for both instances should have no problems picking up the 1kHz signal. – zenith7 Dec 21 '20 at 04:08
  • FFT operate in discrete bins, maybe that is the problem. Try to increase n_fft and hop_length for melspectrogram by factor 2 for the 44100 Hz case? – Jon Nordby Dec 21 '20 at 10:27
  • I set n_fft=2048*64 all the way up to n_fft=len(y1) and nothing changes. The hop_length does nothing as the source file is a pure tone. I'm beginning to believe that librosa has issues. – zenith7 Dec 21 '20 at 11:22

0 Answers0