-1

I'm attempting to complete an experiment whereby a neural network music analysis program is created using Keras, and the number of layers in the neural network is modified to find the effect on performance. This is my source for the program.

I have previously encountered several errors in my program and, on the advice of another developer on Stack Overflow, I have decided to enlist the assistance of the sklearn library section.

This is the code I am using:

import librosa
import librosa.feature
import librosa.display
import glob
import numpy as np
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.utils.np_utils import to_categorical
from sklearn.model_selection import train_test_split,        
StratifiedShuffleSplit, StratifiedKFold



def display_mfcc(song):
    y, _ = librosa.load(song)
    mfcc = librosa.feature.mfcc(y)

    plt.figure(figsize=(10, 4))
    librosa.display.specshow(mfcc, x_axis='time', y_axis='mel')
    plt.colorbar()
    plt.title(song)
    plt.tight_layout()
    plt.show()


def extract_features_song(f):
    y, _ = librosa.load(f)

    mfcc = librosa.feature.mfcc(y)
    mfcc /= np.amax(np.absolute(mfcc))

    return np.ndarray.flatten(mfcc)[:25000]

def generate_features_and_labels():
    all_features = []
     all_labels = []
    genres = ['blues', 'classical', 'country', 'disco', 'hiphop', 
'jazz', 'metal', 'pop', 'reggae', 'rock']
    for genre in genres:
        sound_files = glob.glob('genres/'+genre+'/*.au')
        print('Processing %d songs in %s genre...' % 
        (len(sound_files), genre))
        for f in sound_files:
            features = extract_features_song(f)
            all_features.append(features)
            all_labels.append(genre)

    label_uniq_ids, label_row_ids = np.unique(all_labels, 
    return_inverse=True)
    label_row_ids = label_row_ids.astype(np.int32, copy=False)
    onehot_labels = to_categorical(label_row_ids,   
    len(label_uniq_ids))
    return np.stack(all_features), onehot_labels


features, labels = generate_features_and_labels()

print(np.shape(features))
print(np.shape(labels))

training_split = 0.8

alldata = np.column_stack((features, labels))
sss = StratifiedShuffleSplit(n_splits=1, test_size=0.20,  
random_state=37)
for train_index, test_index in sss.split(X, y):
  X_train, X_test = X[train_index], X[test_index]
  y_train, y_test = y[train_index], y[test_index]


print(np.shape(train))
print(np.shape(test))

train_input = test[:,:-10]
train_labels = train[:,-10:]

test_input = test[:,:-10]
test_labels = test[:,-10:]

print(np.shape(train_input))
print(np.shape(train_labels))

model = Sequential([
    Dense(100, input_dim=np.shape(train_input)[1]),
    Activation('relu'),
    Dense(10),
    Activation('softmax'),
    ])


 model.compile(optimizer='adam',
               loss='categorical_crossentropy',
              metrics=['accuracy'])
print(model.summary())

model.fit(train_input, train_labels, epochs=10, batch_size=32,
          validation_split=0.2) 

loss, acc = model.evaluate(test_input, test_labels, batch_size=32)

print('Done!')
print('Loss: %.4f, accuracy: %.4f' % (loss, acc))

Python began printing what would be the expected response:

Using TensorFlow backend.
Processing 100 songs in blues genre...
Processing 100 songs in classical genre...
Processing 100 songs in country genre...
Processing 100 songs in disco genre...
Processing 100 songs in hiphop genre...
Processing 100 songs in jazz genre...
Processing 100 songs in metal genre...
Processing 100 songs in pop genre...
Processing 100 songs in reggae genre...
Processing 100 songs in rock genre...
(1000, 25000)
(1000, 10)

But this was interrupted by an error message:

Traceback (most recent call last):
  File "/Users/surengrigorian/Documents/Stage1.py", line 70, in <module>
    print(np.shape(train))
NameError: name 'train' is not defined

Thank you for any assistance you can provide concerning this matter.

Suren Grig
  • 75
  • 1
  • 1
  • 11

1 Answers1

0

You're encountering the said error because you don't have a NumPy array called train.

So, you generated your dataset with the following line:

features, labels = generate_features_and_labels()

But when you partitioned the dataset into train and test datasets, you stored them in X_train, X_test and y_train, y_test in the following code snippet.

for train_index, test_index in sss.split(X, y):
  X_train, X_test = X[train_index], X[test_index]
  y_train, y_test = y[train_index], y[test_index]

Moreover, the above arrays will not contain anything since their preceding lines,

alldata = np.column_stack((features, labels))
sss = StratifiedShuffleSplit(n_splits=1, test_size=0.20,  
random_state=37)

indicate that you packed your dataset in alldata, but the data you're actually splitting using StratifiedShuffleSplit is X and y -- wherein you also do not have an array X to begin with.

Given this, you could have split them in the following manner:

for train_index, test_index in sss.split(features, labels):
  x_train, x_test = features[train_index], features[test_index]
  y_train, y_test = labels[train_index], labels[test_index]

Then, if you want to, you could check their shapes in the following manner

print(x_train.shape, x_test.shape, y_train.shape, y_test.shape)
afagarap
  • 650
  • 2
  • 10
  • 22