1

I am building a speech recognition model

after training the model with .wav file (mono)(16000 sampling rate) I tried to test it using a recorded audio the recorded audio's parameter was like the parameter of the audio files which with the model was trained (.wav file) (mono)(16000 sampling rate) and the length is 1 sec

but I got this error

Traceback (most recent call last):
  File "Testing.py", line 21, in <module>
    print(predict(samples))
  File "Testing.py", line 13, in predict
    prob=model.predict(audio.reshape(1,8000,1))
ValueError: cannot reshape array of size 15183 into shape (1,8000,1)

this is the code used for testing

model=load_model('/home/moataz-beheta/Desktop/speech/Model/best_model.hdf5')
filepath='/home/moataz-beheta/Desktop/speech/input/Testing'
def predict(audio):
    prob=model.predict(audio.reshape(1,8000,1))
    index=np.argmax(prob[0])
    return classes[index] #return labels

#reading the voice commands
samples, sample_rate = librosa.load(filepath + '/' + 'PTT-20200625-WA0035.wav', sr = 16000)
samples = librosa.resample(samples, sample_rate, 8000)
ipd.Audio(samples,rate=8000)  
print(predict(samples))

So, how can I solve it ?

  • your `audio`, whatever it is, doesn't have a proper shape apparently. clearly it is an array of size 15183, and this number cannot be reshaped into `(1, 8000, 1)`. the product of the reshape elements should correspond to the number of elements in your array. – Alireza Jun 25 '20 at 22:40
  • so,How I can handle it ? – Moataz Beheta Jun 26 '20 at 00:46
  • It is related to your problem! it's not a technical issue in nature. It is like you are going to reshape the array `[1, 2, 3, 4, 5]` to the shape `(3, 2)` which is impossible because you only have 5 elements but you need 6 to fill a 3x2 array. so you either should do something with your `audio` variable to have a length complying to your reshape method, or change the reshaping dimensions... I don't know which one. it depends on what your code does, and what's the story behind it – Alireza Jun 26 '20 at 08:42
  • Ah so I see... your`wav` file is supposed to be 1 sec long, with sample rate 16k, but it is a bit deviated, either in sample rate or in length. cause it is 817 samples short (to be 16k). as far as I know, a proper ML model for this task is not that much tightly dependent on the bare size of the sample, so I'd have an embedding or feature extraction layer to transform my signal disregarding its length... so I think the right move is to apply such approach to your model – Alireza Jun 26 '20 at 08:47

0 Answers0