I am trying to extract features from .wav files by using MFCC's extracted from wav files.
I'm having trouble converting my list of MFCC's to a numpy array. From my understadning, the error is due to the MFCC's within the MFCC list being the same dimensions, however I'm not sure of the best way to resolve this.
When running this code below:
X = []
y = []
_min, _max = float('inf'), -float('inf')
for _ in tqdm(range(len(new_dataset))):
rand_class = np.random.choice(class_distribution.index, p=prob_distribution)
file = np.random.choice(new_dataset[new_dataset.word == rand_class].index)
label = new_dataset.at[file, 'word']
X_sample = new_dataset.at[file,'coeff']
_min = min(np.amin(X_sample), _min)
_max = max(np.amin(X_sample), _max)
X.append(X_sample if config.mode == 'conv' else X_sample.T)
y.append(classes.index(label))
X, y = np.array(X), np.array(y) #crashes here
I get the following error message:
Traceback (most recent call last):
File "<ipython-input-150-8689abab6bcf>", line 14, in <module>
X, y = np.array(X), np.array(y)
ValueError: could not broadcast input array from shape (13,97) into shape (13)
adding print(X_sample.shape) in the loop produces:
:
(13, 74)
(13, 83)
(13, 99)
(13, 99)
(13, 99)
(13, 55)
(13, 92)
(13, 99)
(13, 99)
(13, 78)
...
From checking, it seems as MFCC's don't all have the same shape as the recordings are not all the same length.
I'd like to know if I'm correct in my assumption that this is the issue, if so how do I fix this issue?If this isn't the issue then I'd equally like to know the solution.
Thanks in advance!