3

I'm trying to plot the wave plot and spectrogram from a 16000Hz 16-bit .wav speech audio. I have successfully obtained the below plots:

wave plot and spectrogram of the word 'about'

However, the time value on the spectrogram is not correct. I'm certain that my sampling rate is consistent (16000Hz) throughout the program, but I still cannot get the correct time value for the spectrogram.

Below is my python script:

import matplotlib.pyplot as plt
import librosa
import librosa.display
import numpy as np

y, sr = librosa.load('about_TTS_0792.wav', sr=16000)
print("Current audio sampling rate: ", sr)

print("Audio Duration:", librosa.get_duration(y=y, sr=sr))

D = librosa.stft(y, hop_length=64, win_length=256)  # STFT of y
S_db = librosa.amplitude_to_db(np.abs(D), ref=np.max)

fig, ax = plt.subplots(nrows=2)

librosa.display.waveplot(y, sr=sr, ax=ax[0])
img = librosa.display.specshow(S_db, sr=sr, x_axis='s', y_axis='linear',ax=ax[1])
ax[1].set(title='Linear spectrogram')
fig.colorbar(img, ax=ax[1], format="%+2.f dB")
fig.tight_layout()

plt.show()

Output for this code:

Current audio sampling rate:  16000

Audio Duration: 0.792

I don't know what I have missed that can cause the inconsistent time values on the x-axis. Please help.

Hendrik
  • 5,085
  • 24
  • 56
John
  • 59
  • 1
  • 2
  • 13

1 Answers1

6

The time axis for an STFT spectrogram depends on two factors: the sample rate and the hop length.

When you compute the STFT, you specify hop_length=64, win_length=256. Note that this information is not contained in D or S_dblibrosa leans more towards a functional approach, not an object-oriented approach.

So when you then go on to show the spectrogram using librosa.display.specshow, you have to specify the hop_length, which you missed. Therefore the default hop_length=512 is used, which leads to a factor 512 / 64 = 8 error. I.e. 0.792 * 8 = 6.336, which matches what you see in your spectrograms.

Also, I believe x_axis='s' should rather be x_axis='time'.

So changing

img = librosa.display.specshow(S_db, sr=sr, x_axis='s', y_axis='linear',ax=ax[1])

to

img = librosa.display.specshow(S_db, sr=sr, hop_length=64, x_axis='time', y_axis='linear', ax=ax[1])

should fix the issue.

Hendrik
  • 5,085
  • 24
  • 56