log mel spectrogram using librosa

Question

I have come across 2 different ways of generating log-mel spectrograms for audio files using librosa and I don't know why they differ in the final output, which one is "correct" or how different is one from the other.

#1

path = "path/to/my/file"
scale, sr = librosa.load(path)
mel_spectrogram = librosa.feature.melspectrogram(scale, sr, n_fft=2048, hop_length=512, n_mels=10, fmax=8000)
log_mel_spectrogram = librosa.power_to_db(mel_spectrogram)
librosa.display.specshow(log_mel_spectrogram, x_axis="time", y_axis="mel", sr=sr)

#2

path = "path/to/my/file"
scale, sr = librosa.load(path)
X = librosa.stft(scale)
Xdb = librosa.amplitude_to_db(abs(X))
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')

The respective images are:

** EDIT ** Now that I specify the number of mel bins to be = 64, I obtain the spectrogram as below:

If I want to process many such spectrograms, should I trim off the bold blue portion above as it is common to all? What does the bold, dark region represent? Is it advisable to use fmax parameter to trim it?

To make the two spectrograms more comparable, you should set n_mels to be something like 64 — Jon Nordby, May 09 '21 at 08:51

score 2 · Answer 1 · answered May 09 '21 at 08:50

2

The second spectrogram is not a mel-spectrogram, but a STFT (sometimes called "linear") spectrogram. It has all the frequency bands from the FFT, (n_fft/2)+1 bands, 1025 for n_fft=2048. Where-as the mel-spectrogram has mel filters applied which reduces the number of bands to n_mels (typically 32-128), in your example set to 10.

answered May 09 '21 at 08:50

Jon Nordby

5,494
1
21
50

1

Thanks. Could you answer the edit I made to the question above? – VITTHAL BHANDARI May 09 '21 at 12:46

log mel spectrogram using librosa

1 Answers1