2

I'm trying to use Mel Spectrograms from audio files of varying lengths for an Automatic Speech Recognition system. Using Mel, the shape is (128,x), where x is different for every file depending on the audio length.

n_fft = 2048
hop_length = 512
n_mels = 128

S = librosa.feature.melspectrogram(y, sr=sr, n_fft=n_fft, hop_length=hop_length, n_mels=n_mels)
S_DB = librosa.power_to_db(S, ref=np.max)

I've used this part of code with the given values. I tried creating an empty fixed length array for each spectrogram and then adding S_DB inside, but the classifier's accuracy was really low. Can I modify some parameter values to achieve a fixed array for all audio lengths?

Additionally, any other suggestion for better results while still using Mel Spectrograms is welcome.

The Wolf
  • 47
  • 2
  • 6

0 Answers0