0

I am trying to extract features from .wav files by using MFCC's extracted from wav files.

I'm having trouble converting my list of MFCC's to a numpy array. From my understadning, the error is due to the MFCC's within the MFCC list being the same dimensions, however I'm not sure of the best way to resolve this.

When running this code below:

X = []
y = []
    _min, _max = float('inf'), -float('inf')  
    for _ in tqdm(range(len(new_dataset))):  
        rand_class = np.random.choice(class_distribution.index, p=prob_distribution)     
        file = np.random.choice(new_dataset[new_dataset.word == rand_class].index)  
        label = new_dataset.at[file, 'word']   
        X_sample = new_dataset.at[file,'coeff']                
        _min = min(np.amin(X_sample), _min)                     
        _max = max(np.amin(X_sample), _max)
        X.append(X_sample if config.mode == 'conv' else X_sample.T)                                                  
        y.append(classes.index(label))     
    X, y = np.array(X), np.array(y)     #crashes here                                               

I get the following error message:

Traceback (most recent call last):

  File "<ipython-input-150-8689abab6bcf>", line 14, in <module>
    X, y = np.array(X), np.array(y)

ValueError: could not broadcast input array from shape (13,97) into shape (13)

adding print(X_sample.shape) in the loop produces:

:
(13, 74)
(13, 83)
(13, 99)
(13, 99)
(13, 99)
(13, 55)
(13, 92)
(13, 99)
(13, 99)
(13, 78)
...

From checking, it seems as MFCC's don't all have the same shape as the recordings are not all the same length.

I'd like to know if I'm correct in my assumption that this is the issue, if so how do I fix this issue?If this isn't the issue then I'd equally like to know the solution.

Thanks in advance!

Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
Zizi96
  • 459
  • 1
  • 6
  • 23
  • `hstack` can join on the 2nd dim, making a (13,n) array – hpaulj Nov 29 '19 at 00:43
  • @Geeocode Added the full traceback. [Here](https://imgur.com/a/5rNBdti) you can peek at the data. Note that each MFCC has a dimesion of 13x99 (13 features measured 99 times in a wav file. There are 94824 wav files so my goal is to have the input dimensions (94824 ,13, 99,1) to input into a CNN. – Zizi96 Nov 29 '19 at 01:10
  • 1
    are you sure, that `X, y = np.array(X), np.array(y)` is the 15th line? – Geeocode Nov 29 '19 at 01:19
  • I removed some code after getting the traceback so ```X, y = np.array(X), np.array(y)``` is actually on line 13 by count. Running the code posted, I get the same traceback error but referencing line 14 this time. Could this be the issue? – Zizi96 Nov 29 '19 at 01:30
  • @hpaulj Thanks for the suggestion, but I'm not sure if that would help with getting to my final objective. See the response to Geeocode directly below for more info. – Zizi96 Nov 29 '19 at 01:32
  • 1
    ok, but what code is in your current line 14? As this is not the full code, I can't figure out, what is exactly in this line. – Geeocode Nov 29 '19 at 01:33
  • `np.array` can make an object dtype array from some mixtures of arrays, but this case, where the 1st dimensions are the same, it produces an error. – hpaulj Nov 29 '19 at 01:44

3 Answers3

1

This reproduces your error:

In [186]: np.array([np.zeros((4,5)),np.ones((4,6))])                            
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-186-e369332b8a05> in <module>
----> 1 np.array([np.zeros((4,5)),np.ones((4,6))])

ValueError: could not broadcast input array from shape (4,5) into shape (4)

If the arrays all have the same shape:

In [187]: np.array([np.zeros((4,6)),np.ones((4,6))]).shape                      
Out[187]: (2, 4, 6)

If one or more differs in the first dimension, we get an object dtype array, essentially an array wrapper around the list:

In [188]: np.array([np.zeros((4,6)),np.ones((3,6))]).shape                      
Out[188]: (2,)

Don't try to combine arrays that (may) differ in shape unless you understand what you need, and what you intend to do with the result. It is possible to make an object dtype array with the first case, but construction process is a bit roundabout. I won't go into that unless you really such an array.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • @hpaulij Thanks for the response, it has been very informative. I think I need to go back to my data and pad values which aren't long enough and remove those are too long. Unequal dimensions being fed into object dtype array causing these issues! – Zizi96 Nov 29 '19 at 12:20
1

You will need to truncate or pad the time dimension in order to make it into arrays of the same size. If you have very varying lengths, you can use a fixed length analysis windows (say over 1 or 10 seconds of MFCCs) and have multiple of these per input audio clip. This principle is shown here, How to use a context window to segment a whole log Mel-spectrogram (ensuring the same number of segments for all the audios)?

Jon Nordby
  • 5,494
  • 1
  • 21
  • 50
0

This reproduces your error:

In [186]: np.array([np.zeros((4,5)),np.ones((4,6))])                            
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-186-e369332b8a05> in <module>
----> 1 np.array([np.zeros((4,5)),np.ones((4,6))])

ValueError: could not broadcast input array from shape (4,5) into shape (4)

If the arrays all have the same shape:

In [187]: np.array([np.zeros((4,6)),np.ones((4,6))]).shape                      
Out[187]: (2, 4, 6)

If one or more differs in the first dimension, we get an object dtype array, essentially an array wrapper around the list:

In [188]: np.array([np.zeros((4,6)),np.ones((3,6))]).shape                      
Out[188]: (2,)

The first case works if we do:

In [189]: arr = np.zeros(2,object)                                              
In [190]: arr[:] = [np.zeros((4,5)),np.ones((4,6))]                             
In [191]: arr                                                                   
Out[191]: 
array([array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]]),
       array([[1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.]])], dtype=object)
hpaulj
  • 221,503
  • 14
  • 230
  • 353