2

I am trying to learn cnn network to recognize emotion in speech. For this I am using the mel-ceptral coefficients (mfcc) which represent each audio file as two dimensional array (number of frames * number of mfcc coefficients). I want to have a 3-dimentional array as an input for my cnn convolution layer, where 3rd dimension is number of audio files. How can i get such array?

for i in range(len(audio_list)):
        (rate,sig) = wav.read(source_folder + audio_list[i])
        inputs = mfcc(sig, rate, nfft=1300)
        # Transform in 3D array
        train_inputs[i] = (np.asarray(inputs[np.newaxis, :]))
ness_cons
  • 31
  • 6

1 Answers1

0

If your inputs is a list convert it to numpy array, by np.array(inputs)

I think what you're trying to do is this:

train_inputs[i] = inputs.reshape((1,inputs.shape[0],inputs.shape[1]))

This line is basically putting the entire matrix into another matrix, giving it one more dimension.

ashish-ucsb
  • 101
  • 6