MFCC spectrogram vs Scipi Spectrogram

Question

I am currently working on a Convolution Neural Network (CNN) and started to look at different spectrogram plots:

With regards to the Librosa Plot (MFCC), the spectrogram is way different that the other spectrogram plots. I took a look at the comment posted here talking about the "undetailed" MFCC spectrogram. How to accomplish the task (Python Code wise) posted by the solution given there?

Also, would this poor resolution MFCC plot miss any nuisances as the images go through the CNN?

Any help in carrying out the Python Code mentioned here will be sincerely appreciated!

Here is my Python code for the comparison of the Spectrograms and here is the location of the wav file being analyzed.

Python Code

# Load various imports
import os
import librosa
import librosa.display
import matplotlib.pyplot as plt

import scipy.io.wavfile
#24bit accessible version
import wavfile

plt.figure(figsize=(17, 30))

filename = 'AWCK AR AK 47 Attached.wav'
librosa_audio, librosa_sample_rate = librosa.load(filename, sr=None)
plt.subplot(4,1,1)
xmin = 0
plt.title('Original Audio - 24BIT')
fig_1 = plt.plot(librosa_audio)

sr = librosa_sample_rate

plt.subplot(4,1,2)
mfccs = librosa.feature.mfcc(y=librosa_audio, sr=librosa_sample_rate, n_mfcc=40)
librosa.display.specshow(mfccs, sr=librosa_sample_rate, x_axis='time', y_axis='hz')
plt.title('Librosa Plot')
print(mfccs.shape)


plt.subplot(4,1,3)
X = librosa.stft(librosa_audio)
Xdb = librosa.amplitude_to_db(abs(X))
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
# plt.colorbar()

# maximum frequency
Fs = 96000.

samplerate, data = scipy.io.wavfile.read(filename)
plt.subplot(4,1,4)
plt.specgram(data, Fs=samplerate)
plt.title('Scipy Plot (Fs=96000)')

plt.show()

score 3 · Accepted Answer · answered Dec 15 '20 at 13:41

MFCCs are not spectrograms (time-frequency), but "cepstrograms" (time-cepstrum). Comparing MFCC with spectrogram visually is not easy, and I am not sure it is very useful either. If you wish to do so, then invert the MFCC to get back a (mel) spectrogram, by doing an inverse DCT. You can probably use mfcc_to_mel for that. This will allow to estimate how much data has been lost in the MFCC forward transformation. But it may not say much about how much relevant information for your task has been lost, or how much reduction there has been in irrelevant noise. This needs to be evaluated for your task and dataset. The best way is to try different settings, and evaluate performance using the evaluation metrics that you care about.

Note that MFCCs may not be such a great representation for the typical 2D CNNs that are applied to spectrograms. That is because the locality has been reduced: In the MFCC domain, frequencies that are close to eachother are no longer next to eachother in vertical axis. And because 2D CNNs have kernels with limited locality (typ 3x3 or 5x5 early on), this can reduce performance of the model.

thanks for your response. Sincerely appreciated! Where can I find out more information about the MFCC related to the answer you gave to me? I have yet to come across information that you mentioned in your answer. Can you direct me to an authoritative location/document describing some of the information that you shared with me here? Thanks! — Joe, Dec 15 '20 at 15:48
For MFCC, I added some some references here. https://stackoverflow.com/a/65208434/1967571 - not really authoritative but probably quite useful. I would expect any good textbook on Speech Recognition to cover it pretty well — Jon Nordby, Dec 15 '20 at 15:52
For CNN on MFCC issues, I am not aware of an authoritative source. Its kinda "commonsense" and with some tangiential support in empirical evaluations (CNNs on melspecogram do better than on MFCC). If you do find a good source for that, would love to see it! — Jon Nordby, Dec 15 '20 at 15:53

MFCC spectrogram vs Scipi Spectrogram

1 Answers1