1
  1. I am new to speech recognition.
  2. I plan to extract the Mel spectrum of the audio data, but I print out img.shape and find that its dimension is (650, 20000, 4), and the last dimension is 4, I don’t know why.

Below is my code function.

def read_wav_data(filename):
    y, sr = librosa.load(filename, sr=None)
    return y, sr

def GetFrequencyFeature5(y, sr):
    melspec = librosa.feature.melspectrogram(y, sr, n_fft=1024, hop_length=16, n_mels=32, fmin=50, fmax=350)
    logmelspec = librosa.power_to_db(melspec)
    print(logmelspec)
    print(logmelspec.shape)
    plt.figure()
    file = librosa.display.specshow(logmelspec, sr=sr, x_axis='time', y_axis='mel', fmin=50, fmax=350)
    plt.title('Beat wavform')
    plt.save(file.png)
    # img = mpimg.imread(file.png)
    plt.show()
    # return img

Hendrik
  • 5,085
  • 24
  • 56
赵若琰
  • 11
  • 2
  • Does librosa print any warnings about Mel bands? `fmax` seems very low—have you tried to increase it, perhaps to 4000. When you do that, does the shape of your Mel spectrogram shape? – Hendrik Jan 23 '20 at 20:49
  • Thank you for your answer, there is no warning when extracting the Mel spectrum,we choose is [50, 350] Hz in this task in order to cover the range of human tones’ F0,and I tried to modify fmax to 2000, the third dimension value is still 4. – 赵若琰 Jan 28 '20 at 02:02

0 Answers0