1

I am extracting a log Mel - spectrogram from the GTZAN dataset using Librosa in python. My code -

data, sampling_rate = librosa.load(os.path.join(dir, folder, file), )
mel = librosa.feature.melspectrogram(y=data, hop_length = 512//2, n_fft = 512, n_mels = 64)
mel = librosa.power_to_db(mel**2)

Well, It works perfectly. But, the size of each Mel-spectrogram is different. Most of the log Mel-spectrogram having a size of 2586, a few of them having 2590 to 2620.

I checked the size is different when taking the log on Mel-spectrogram. How they differ in size when taking the log were all audios are in the same length...

Any suggestion, thanks

1 Answers1

0

Probably the audio files have slight variations in length. That often happens in a dataset. You should probably truncate all spectrograms to the shortest common length (2586).

mel = mel[:,0:2586]
Jon Nordby
  • 5,494
  • 1
  • 21
  • 50
  • Yeah, That can work. But, I did take the total length as 2700 and copy the last value in the mel-spectorgram until the index reaches 2700. It works well. Thank you for another answer... –  Dec 09 '19 at 04:53