I am trying to learn cnn network to recognize emotion in speech. For this I am using the mel-ceptral coefficients (mfcc) which represent each audio file as two dimensional array (number of frames * number of mfcc coefficients). I want to have a 3-dimentional array as an input for my cnn convolution layer, where 3rd dimension is number of audio files. How can i get such array?
for i in range(len(audio_list)):
(rate,sig) = wav.read(source_folder + audio_list[i])
inputs = mfcc(sig, rate, nfft=1300)
# Transform in 3D array
train_inputs[i] = (np.asarray(inputs[np.newaxis, :]))