2

I am trying to extract features from .wav files by using MFCC's of the sound files. I am getting an error when I try to convert my list of MFCC's to a numpy array. I am quite sure that this error is occurring because the list contains MFCC values with different shapes (But am unsure of how to solve the issue).

I have looked at 2 other stackoverflow posts, however these don't solve my problem because they are too specific to a certain task.

ValueError: could not broadcast input array from shape (128,128,3) into shape (128,128)

Value Error: could not broadcast input array from shape (857,3) into shape (857)

Full Error Message:

Traceback (most recent call last): File "/..../.../...../Batch_MFCC_Data.py", line 68, in X = np.array(MFCCs) ValueError: could not broadcast input array from shape (20,590) into shape (20)

Code Example:

all_wav_paths = glob.glob('directory_of_wav_files/**/*.wav', recursive=True)
np.random.shuffle(all_wav_paths)

MFCCs = [] #array to hold all MFCC's
labels = [] #array to hold all labels

for i, wav_path in enumerate(all_wav_paths):

    individual_MFCC = MFCC_from_wav(wav_path)
    #MFCC_from_wav() -> returns the MFCC coefficients 

    label = get_class(wav_path)
    #get_class() -> returns the label of the wav file either 0 or 1

    #add features and label to the array
    MFCCs.append(individual_MFCC)
    labels.append(label)

#Must convert the training data to a Numpy Array for 
#train_test_split and saving to local drive

X = np.array(MFCCs) #THIS LINE CRASHES WITH ABOVE ERROR

# binary encode labels
onehot_encoder = OneHotEncoder(sparse=False)
Y = onehot_encoder.fit_transform(labels)

#create train/test data
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(MFCCs, Y, test_size=0.25, random_state=0)

#saving data to local drive
np.save("LABEL_SAVE_PATH", Y)
np.save("TRAINING_DATA_SAVE_PATH", X)

Here is a snapshot of the shape of the MFCC's (from .wav files) in the MFCCs array

The MFCCs array contains with the following shapes :

...More above...
(20, 423) #shape of returned MFCC from one of the .wav files
(20, 457)
(20, 1757)
(20, 345)
(20, 835)
(20, 345)
(20, 687)
(20, 774)
(20, 597)
(20, 719)
(20, 1195)
(20, 433)
(20, 728)
(20, 939)
(20, 345)
(20, 1112)
(20, 345)
(20, 591)
(20, 936)
(20, 1161)
....More below....

As you can see, the MFCC's in the MFCCs array don't all have the same shape, and this is because the recordings are not all the same lengths of time. Is this the reason why I can't convert the array to a numpy array? If this is the issue, how do I fix this issue to have the same shape throughout the MFCC array?

Any code snippets for accomplishing this and advice would be greatly appreciated!

Thanks!

Sreehari R
  • 919
  • 4
  • 11
  • 21
  • So, this `MFCCs = []` contains only the shape or the actual arrays itself? I mean what is the `type(individual_MFCC)` is it an array or just shape information? – kmario23 Dec 28 '17 at 04:00
  • Sorry for confusion the type(individual_MFCC) = – Sreehari R Dec 28 '17 at 04:03
  • Yes, the problem is due to shape mismatch between different arrays as you mentioned. I have two questions: 1) does the first dimension always have shape `(20, ...)` , 2) do you know the maximum size of your array? – kmario23 Dec 28 '17 at 04:11
  • @SreehariR Let's be a little [Minimal, Complete, and Verifiable](https://stackoverflow.com/help/mcve). Your `MFCCs` list (not array!) contains elements called `individual_MFCC`, and they are `np.array`s of different shapes. Correct? Let's say your two `individual_MFCC` are `np.array`s of shape `(20,424)` and `(20,457)`. What shape do you want the result have, when you put these together into the list of `MFCCs` and convert into `np.array`? – FatihAkici Dec 28 '17 at 04:12
  • @kmario23 Ok, so for (1) yes the first dimension always has shape (20, ....), and 2) I don't know the max size of my array. – Sreehari R Dec 28 '17 at 04:13
  • @kmario23 should I figure out the max size of the array for you – Sreehari R Dec 28 '17 at 04:14
  • @SreehariR I see. Also, does it make a difference in your training algorithm if we resize all your arrays to a fixed size (because we need fixed shape to create an array) and fill the missing values with `zeros`. Also, yes finding the max size should be very helpful – kmario23 Dec 28 '17 at 04:15
  • @kmario23 Yeah I was thinking about that but I wasn't sure how it would affect my training algorithm. I'm planning on using SVM, KNN, and Logistic Regression as a start. – Sreehari R Dec 28 '17 at 04:18
  • @kmario23 Ok, I'm running through data and finding max size now – Sreehari R Dec 28 '17 at 04:23
  • Ok, there're two possibilities: 1) take the lowest possible shape e.g. `(20, 345)` from your example, and then make all arrays to have this shape (i.e. discard the remaining columns). 2) Reshape arrays to the maximum shape i.e. fill the needed columns with `zeros` or `nan` values. however, I think that the first approach should be good. Then it's possible to stack the arrays – kmario23 Dec 28 '17 at 04:34
  • @kmario23 In either case, would I be able to use list.reshape(20, 345) or list.reshape(20, MAX_Value)? Would the reshape command fill in the extra values with 0? – Sreehari R Dec 28 '17 at 04:37
  • @SreehariR added a sample answer. Have a look! – kmario23 Dec 28 '17 at 05:01
  • @kmario23 Thanks for the example! Just a quick question, I am going to compare the performance of using a max_shape, and a min_shape. So in order to add all of the zeros to each index can I use the following logic MFCCs[idx] = arr[:, :max_shape[1]]? – Sreehari R Dec 28 '17 at 05:03

1 Answers1

1

Use the following logic to downsample the arrays to min_shape i.e. reduce larger arrays to min_shape

min_shape = (20, 345)
MFCCs = [arr1, arr2, arr3, ...]    

for idx, arr in enumerate(MFCCs):
    MFCCs[idx] = arr[:, :min_shape[1]]

batch_arr = np.array(MFCCs)

And then you can stack these arrays in a batch array as in the below minimal example:

In [33]: a1 = np.random.randn(2, 3)    
In [34]: a2 = np.random.randn(2, 5)    
In [35]: a3 = np.random.randn(2, 10)

In [36]: MFCCs = [a1, a2, a3]

In [37]: min_shape = (2, 2)

In [38]: for idx, arr in enumerate(MFCCs):
    ...:     MFCCs[idx] = arr[:, :min_shape[1]]
    ...:     

In [42]: batch_arr = np.array(MFCCs)

In [43]: batch_arr.shape
Out[43]: (3, 2, 2)

Now for the second strategy, to upsample the arrays smaller arrays to max_shape, follow similar logic but fill the missing values with either zeros or nan values as you prefer.

And then again, you can stack the arrays as a batch array of shape (num_arrays, dim1, dim2); So, for your case, the shape should be (num_wav_files, 20, max_column)

kmario23
  • 57,311
  • 13
  • 161
  • 150