I am having some odd vertical scaling issues with librosa.feature.melspectrogram(). It seems that when I use librosa.load() with sr=None, the Hz scale doesn't coincide with the intended spectrographic features. To investigate this further, I looked at a pure 1,000Hz tone which I got from https://www.mediacollege.com/audio/tone/download/
import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np
filename = '1kHz_44100Hz_16bit_05sec.wav'
y1, sr1 = librosa.load(filename,sr=None)
y2, sr2 = librosa.load(filename)
fig, ax = plt.subplots(1,2)
S = librosa.feature.melspectrogram(y1, sr=sr1, n_mels=128)
S_DB = librosa.power_to_db(S, ref=np.max)
librosa.display.specshow(S_DB, sr=sr1, y_axis='mel', ax=ax[0]);
ax[0].title.set_text(f"sr1={sr1}\nload(filename,sr=None)")
S = librosa.feature.melspectrogram(y2, sr=sr2, n_mels=128)
S_DB = librosa.power_to_db(S, ref=np.max)
librosa.display.specshow(S_DB, sr=sr2, y_axis='mel', ax=ax[1]);
ax[1].title.set_text(f"sr2={sr2}\nload(filename)")
plt.tight_layout()
I'm not sure why the 1kHz tone is not lining up in both spectrograms. I would suspect the one with sr=None to be the more accurate as it is using the actual samplerate from the file. Would anyone know why there is a difference? The feature in the left plot is obviously not at 1kHz, but more like 800Hz or so. Thanks.