1

I generate a simple sine wave with a frequency of 100 and calculate an FFT to check that the obtained frequency is correct.

Then I calculate melspectrogram but do not understand what its output means? where do I see the frequency 100 in this output? Why is the yellow bar located in the 25th area?

# In[4]:
import numpy as np
import matplotlib.pyplot as plt
import scipy.fft
import librosa

def generate_sine_wave(freq, sample_rate, duration)-> tuple[np.ndarray, np.ndarray]: 
    x = np.linspace(0, duration, int(sample_rate * duration), endpoint=False)
    frequencies = x * freq
    # 2pi because np.sin takes radians
    y = np.sin(2 * np.pi * frequencies)
    return x, y

sample_rate = 1024
freq = 100
x, y = generate_sine_wave(freq, sample_rate, 2)
plt.figure(figsize=(10, 4))
plt.plot(x, y)
plt.grid(True)

fft = scipy.fft.fft(y)
fft = fft[0 : len(fft) // 2]
fft = np.abs(fft)
xs = np.linspace(0, sample_rate // 2, len(fft))
plt.figure(figsize=(15, 4))
plt.plot(xs, fft)
plt.grid(True)

melsp = librosa.feature.melspectrogram(sr=sample_rate, y=y)
melsp = melsp.T
plt.matshow(melsp)
plt.title('melspectrogram')
max = np.max(melsp)
print('melsp.shape =', melsp.shape)
print('melsp max =', max)

enter image description here

enter image description here

enter image description here

If I change the frequency to 200, melspectrogram it gives me this:

enter image description here

Why is the yellow bar in the 50 area?

Christoph Rackwitz
  • 11,317
  • 4
  • 27
  • 36
codeDom
  • 1,623
  • 18
  • 54

1 Answers1

2

librosa's melspectrogram function computes a mel-scaled spectrogram. This is the same as the usual linear-scale spectrogram, but with the frequency axis resampled to a warped mel scale.

Relating a particular bin ("why 25?") to frequency in Hz is complicated but doable:

  1. melspectrogram maps frequency range [0, sr/2] to mel space. In your example, [0, 512] Hz maps to mel in the range 0 to 7.68 (= librosa.hz_to_mel(512)).
  2. The range is uniformly divided into 128 bins (by default). The ith mel bin center corresponds to librosa.mel_to_hz(i * 7.68 / 127).

Then for bins 25 and 50 in particular, we can verify that they correspond to the expected frequencies:

  • librosa.mel_to_hz(25 * 7.68 / 127) = 100.7874
  • librosa.mel_to_hz(50 * 7.68 / 127) = 201.5748

For plotting, the melspectrogram documentation suggests displaying mel-scale specrograms using librosa.display.specshow with the option y_axis='mel', like:

fig, ax = plt.subplots()
S_dB = librosa.power_to_db(S, ref=np.max)
img = librosa.display.specshow(S_dB, x_axis='time',
                         y_axis='mel', sr=sr,
                         fmax=8000, ax=ax)
fig.colorbar(img, ax=ax, format='%+2.0f dB')
ax.set(title='Mel-frequency spectrogram')

This plots the mel specrogram with the y axis labeled in Hz, but correctly warped for the mel scale.

Pascal Getreuer
  • 2,906
  • 1
  • 5
  • 14