Why spectrogram from librosa library have twice the time duration of the actual audio track?

Question

I am using the following code to obtain Mel spectrogram from a recorded audio signal of about 30 s:

spectrogram =  librosa.feature.melspectrogram(y=self.RawSamples,sr=self.SamplingFrequency, n_mels=128, fmax=8000)

if show:
    plt.figure(figsize=(10, 4))
    librosa.display.specshow(librosa.power_to_db(self.Spectrogram, ref=np.max), y_axis='mel', fmax=8000, x_axis='time')
    plt.colorbar(format='%+2.0f dB')
    plt.title('Mel spectrogram')
    plt.tight_layout()

Obtained spectrogram: Mel spectrogram

Can you please explain me why the time axis depicts twice the time duration (it should be 30 s). What is going wrong with the code?

OK - so if you are treating the samples as a single channel then you will get twice the duration. — Paul R, Jul 12 '18 at 15:36
Do you know if there's any attribute to set when a call the spectrogram method from librosa in order to avoid this? Btw, thank you so much for answering, it's helping a lot @PaulR — LiukPet, Jul 12 '18 at 15:40
I'm not familiar with the particular library, but it should be fairly trivial to either extract a single (left or right) channel, or combine both channels into a single (mono) channel, and then process that. — Paul R, Jul 12 '18 at 15:46
Perhaps try [`librosa.core.to_mono`](https://librosa.github.io/librosa/generated/librosa.core.to_mono.html) ? — Paul R, Jul 12 '18 at 15:52

score 3 · Accepted Answer · answered Jul 15 '18 at 18:06

3

You need to pass the sampling rate to librosa.display.specshow (sr=self.SamplingFrequency). If not it defaults to 20050 and if self.SamplingFrequency is a different value, it will display the wrong length.

answered Jul 15 '18 at 18:06

Jon Nordby

5,494
1
21
50

I'm working on 90 seconds with 12000 sample rate. The spectrogram show the wrong time duration but the spectrogram itself is actually right, right? – LotOfQuestion Dec 30 '19 at 07:17
Quite possible. But best to pass the right options to specshow to verify that things are as you expect – Jon Nordby Dec 30 '19 at 13:52
My code look like this
`data, sr = librosa.load(file_name, sr=None, duration=90)`
`D = librosa.amplitude_to_db(np.abs(librosa.stft(data)), ref=np.max)`
`librosa.display.specshow(D, y_axis='linear', x_axis='time', sr=12000)`
Am I passing the wrong? – LotOfQuestion Dec 31 '19 at 01:50
That looks about right. But if you have an issue you should open a new question – Jon Nordby Dec 31 '19 at 09:48
Now the spectrogram shows the right duration but the wrong sample rate:(. Thank you anyway:) – LotOfQuestion Jan 01 '20 at 14:06

vskadandale · Answer 2 · 2021-04-06T18:15:50.330

Your librosa.display.spechow should include the parameters: sampling rate sr=<your_sampling_rate> and also the hop size hop_size=<your_hop_size>. The default values for these parameters are 22050 and 512, respectively. Not setting them correctly leads to incorrect x-axis in the resulting spectrogram.

Reference: http://man.hubwiz.com/docset/LibROSA.docset/Contents/Resources/Documents/generated/librosa.display.specshow.html

Why spectrogram from librosa library have twice the time duration of the actual audio track?

2 Answers2