1

I am quite new in DSP and generating the spectrogram of an audio file. My spectrogram is not smooth and it is showing the quite raw image with pixel values, something like this

enter image description here

While I am looking for a smooth spectrogram like this

enter image description here

Where I am doing mistake? Is it because of size of window size? My code to generate Mel spectrogram is

def readData(file):
    origData,origSampFreq = librosa.load(file, sr=None)
    return origData, origSampFreq


def resample(originalData, origSampFreq, targetSampFreq):
    resampledData = librosa.resample(originalData, origSampFreq, targetSampFreq)
    return resampledData


def normalizeSound(resampledData, axis):
    """ Axis is 0 for row-wise and 1 
    for column wise"""
    normalizedData = normalize(resampledData, axis)
    return normalizedData

def calculateMelSpectogram(normalizedData, hop_length, win_length, sr):
    #newSamplingFreq = 16000
    S=librosa.feature.melspectrogram(normalizedData, sr=sr, hop_length=hop_length, win_length=win_length)
    return S

#Plot melspectogram

def plotMelSpectogram(S, sr, name, ref=np.max):
    plt.figure(figsize=(10,3))
    S_dB = librosa.power_to_db(S, ref=np.max)
    librosa.display.specshow(S_dB, x_axis='time',y_axis='mel', sr=16000,)
    plt.colorbar(format='%+2.0f dB')
    plt.title('Mel-frequency spectrogram')
    plt.savefig('./chunk_images/' + name + "mel.png",dpi=(300), bbox_inches='tight')
    plt.tight_layout()
    plt.show()
def featureExtraction(audioFile, name, targetSampFreq = 16000, 
                      axis =0 , 
                      hop_length= 256,
                      win_length=512):
    
    y, y_sr = readData(file=audioFile)
    print(y, y_sr)
    resampledData = resample(originalData=y, origSampFreq=y_sr, targetSampFreq=targetSampFreq)
    normalizedData = normalizeSound(resampledData, axis=axis)
    S = calculateMelSpectogram(normalizedData=normalizedData, hop_length=hop_length, win_length=win_length, sr=targetSampFreq)
    plotSound(soundData=normalizedData, sr=targetSampFreq,x_axis_string='time' , name = name)
    plotMelSpectogram(S, sr=targetSampFreq, name = name, ref=np.max)
    return S

# plot orginal time domain data

def plotSound(soundData, sr, x_axis_string, name):
    plt.figure(figsize=(10,3))
    waveplot(soundData, sr, x_axis=x_axis_string)
    plt.savefig('./chunk_images/' + name + "sound.png",dpi=(300), bbox_inches='tight')
Aaditya Ura
  • 12,007
  • 7
  • 50
  • 88
  • I am really familiar DSP, but I can't get what the problem is. Is the top image showing a zoomed-in version of the lower image without any smoothing? Can you reproduce the problem with a small sample of data (or random values) that we could work on? In any case, smoothing a Mel-spectrogram can change a lot the way it looks. – Leonard Sep 26 '20 at 07:19
  • Your two images have quite different time-scales, so it is hard to compare their smoothness. – Jon Nordby Sep 26 '20 at 20:22

1 Answers1

0

The time-resolution of a mel-spectrogram is specified by the hop_length. 256 samples at 16kHz is 16 ms, a rather high resolution. Lower values means higher resolution. You can achieve some smoothing by using a n_fft which is larger than hop_length. The default is n_fft = 4x hop_length, where as you have only 2x.

The frequency resolution is given by n_mels, which you have not specified. Higher gives higher resolution. It is often in the range 32-256 bands, typical value being 128 (also the default in librosa). If you want even higher than that along frequency axis, you are better off using only the STFT - without applying Mel filterbank.

Btw, you are not passing hop_length to librosa.display.specshow, so the time-axis is likely of the plot is likely to be wrong.

Jon Nordby
  • 5,494
  • 1
  • 21
  • 50