Why spectrogram from librosa library have different time duration of the actual audio track?

Question

I'm trying to plot the wave plot and spectrogram from a 16000Hz 16-bit .wav speech audio. I have successfully obtained the below plots:

However, the time value on the spectrogram is not correct. I'm certain that my sampling rate is consistent (16000Hz) throughout the program, but I still cannot get the correct time value for the spectrogram.

Below is my python script:

import matplotlib.pyplot as plt
import librosa
import librosa.display
import numpy as np

y, sr = librosa.load('about_TTS_0792.wav', sr=16000)
print("Current audio sampling rate: ", sr)

print("Audio Duration:", librosa.get_duration(y=y, sr=sr))

D = librosa.stft(y, hop_length=64, win_length=256)  # STFT of y
S_db = librosa.amplitude_to_db(np.abs(D), ref=np.max)

fig, ax = plt.subplots(nrows=2)

librosa.display.waveplot(y, sr=sr, ax=ax[0])
img = librosa.display.specshow(S_db, sr=sr, x_axis='s', y_axis='linear',ax=ax[1])
ax[1].set(title='Linear spectrogram')
fig.colorbar(img, ax=ax[1], format="%+2.f dB")
fig.tight_layout()

plt.show()

Output for this code:

Current audio sampling rate:  16000

Audio Duration: 0.792

I don't know what I have missed that can cause the inconsistent time values on the x-axis. Please help.

score 6 · Accepted Answer · answered Feb 17 '21 at 07:22

The time axis for an STFT spectrogram depends on two factors: the sample rate and the hop length.

When you compute the STFT, you specify hop_length=64, win_length=256. Note that this information is not contained in D or S_db—librosa leans more towards a functional approach, not an object-oriented approach.

So when you then go on to show the spectrogram using librosa.display.specshow, you have to specify the hop_length, which you missed. Therefore the default hop_length=512 is used, which leads to a factor 512 / 64 = 8 error. I.e. 0.792 * 8 = 6.336, which matches what you see in your spectrograms.

Also, I believe x_axis='s' should rather be x_axis='time'.

So changing

img = librosa.display.specshow(S_db, sr=sr, x_axis='s', y_axis='linear',ax=ax[1])

to

img = librosa.display.specshow(S_db, sr=sr, hop_length=64, x_axis='time', y_axis='linear', ax=ax[1])

should fix the issue.

Clear explanation. Adding hop_length fixed the error. Thank you Hendrik. — John, Feb 17 '21 at 07:48

Why spectrogram from librosa library have different time duration of the actual audio track?

1 Answers1