- I am new to speech recognition.
- I plan to extract the Mel spectrum of the audio data, but I print out
img.shape
and find that its dimension is(650, 20000, 4)
, and the last dimension is4
, I don’t know why.
Below is my code function.
def read_wav_data(filename):
y, sr = librosa.load(filename, sr=None)
return y, sr
def GetFrequencyFeature5(y, sr):
melspec = librosa.feature.melspectrogram(y, sr, n_fft=1024, hop_length=16, n_mels=32, fmin=50, fmax=350)
logmelspec = librosa.power_to_db(melspec)
print(logmelspec)
print(logmelspec.shape)
plt.figure()
file = librosa.display.specshow(logmelspec, sr=sr, x_axis='time', y_axis='mel', fmin=50, fmax=350)
plt.title('Beat wavform')
plt.save(file.png)
# img = mpimg.imread(file.png)
plt.show()
# return img