Librosa Mel-Spectrogram log Shape

Question

I am extracting a log Mel - spectrogram from the GTZAN dataset using Librosa in python. My code -

data, sampling_rate = librosa.load(os.path.join(dir, folder, file), )
mel = librosa.feature.melspectrogram(y=data, hop_length = 512//2, n_fft = 512, n_mels = 64)
mel = librosa.power_to_db(mel**2)

Well, It works perfectly. But, the size of each Mel-spectrogram is different. Most of the log Mel-spectrogram having a size of 2586, a few of them having 2590 to 2620.

I checked the size is different when taking the log on Mel-spectrogram. How they differ in size when taking the log were all audios are in the same length...

Any suggestion, thanks

score 0 · Answer 1 · answered Dec 07 '19 at 11:05

0

Probably the audio files have slight variations in length. That often happens in a dataset. You should probably truncate all spectrograms to the shortest common length (2586).

mel = mel[:,0:2586]

answered Dec 07 '19 at 11:05

Jon Nordby

5,494
1
21
50

Yeah, That can work. But, I did take the total length as 2700 and copy the last value in the mel-spectorgram until the index reaches 2700. It works well. Thank you for another answer... – Dec 09 '19 at 04:53

Librosa Mel-Spectrogram log Shape

1 Answers1