I am trying to extract features from .wav files by using MFCC's of the sound files. I am getting an error when I try to convert my list of MFCC's to a numpy array. I am quite sure that this error is occurring because the list contains MFCC values with different shapes (But am unsure of how to solve the issue).
I have looked at 2 other stackoverflow posts, however these don't solve my problem because they are too specific to a certain task.
ValueError: could not broadcast input array from shape (128,128,3) into shape (128,128)
Value Error: could not broadcast input array from shape (857,3) into shape (857)
Full Error Message:
Traceback (most recent call last): File "/..../.../...../Batch_MFCC_Data.py", line 68, in X = np.array(MFCCs) ValueError: could not broadcast input array from shape (20,590) into shape (20)
Code Example:
all_wav_paths = glob.glob('directory_of_wav_files/**/*.wav', recursive=True)
np.random.shuffle(all_wav_paths)
MFCCs = [] #array to hold all MFCC's
labels = [] #array to hold all labels
for i, wav_path in enumerate(all_wav_paths):
individual_MFCC = MFCC_from_wav(wav_path)
#MFCC_from_wav() -> returns the MFCC coefficients
label = get_class(wav_path)
#get_class() -> returns the label of the wav file either 0 or 1
#add features and label to the array
MFCCs.append(individual_MFCC)
labels.append(label)
#Must convert the training data to a Numpy Array for
#train_test_split and saving to local drive
X = np.array(MFCCs) #THIS LINE CRASHES WITH ABOVE ERROR
# binary encode labels
onehot_encoder = OneHotEncoder(sparse=False)
Y = onehot_encoder.fit_transform(labels)
#create train/test data
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(MFCCs, Y, test_size=0.25, random_state=0)
#saving data to local drive
np.save("LABEL_SAVE_PATH", Y)
np.save("TRAINING_DATA_SAVE_PATH", X)
Here is a snapshot of the shape of the MFCC's (from .wav files) in the MFCCs array
The MFCCs array contains with the following shapes :
...More above...
(20, 423) #shape of returned MFCC from one of the .wav files
(20, 457)
(20, 1757)
(20, 345)
(20, 835)
(20, 345)
(20, 687)
(20, 774)
(20, 597)
(20, 719)
(20, 1195)
(20, 433)
(20, 728)
(20, 939)
(20, 345)
(20, 1112)
(20, 345)
(20, 591)
(20, 936)
(20, 1161)
....More below....
As you can see, the MFCC's in the MFCCs array don't all have the same shape, and this is because the recordings are not all the same lengths of time. Is this the reason why I can't convert the array to a numpy array? If this is the issue, how do I fix this issue to have the same shape throughout the MFCC array?
Any code snippets for accomplishing this and advice would be greatly appreciated!
Thanks!