1

I have recently attempted to perform an experiment whereby a neural network written in the Python IDE IDLE using Keras is used to analyse the GTZAN dataset of songs. I am attempting to vary the layers in order to see if there is any impact on performance. I am basing my experiment on a particular article detailing the basis of this project:

https://medium.com/@navdeepsingh_2336/identifying-the-genre-of-a-song-with-neural-networks-851db89c42f0

The code displayed in the article, collectively, forms this program:

import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.utils.np_utils import to_categorical

def display_mfcc(song):
    y, _ = librosa.load(song)
    mfcc = librosa.feature.mfcc(y)

    plt.figure(figsize=(10, 4))
    librosa.display.specshow(mfcc, x_axis='time', y_axis='mel')
    plt.colorbar()
    plt.title(song)
    plt.tight_layout()
    plt.show()


def extract_features_song(f):
    y, _ = librosa.load(f)

    mfcc = librosa.feature.mfcc(y)
    mfcc /= np.amax(np.absolute(mfcc))

    return np.ndarray.flatten(mfcc)[:25000]

def generate_features_and_labels():
    all_features = []
    all_labels = []
    genres = ['blues', 'classical', 'country', 'disco', 'hiphop',
    'jazz', 'metal', 'pop', 'reggae', 'rock']

    for genre in genres:
        sound_files = glob.glob('genres/'+genre+'/*.au')
        print('Processing %d songs in %s genre...' % 
        (len(sound_files), genre))
        for f in sound_files:
            features = extract_features_song(f)
            all_features.append(features)
            all_labels.append(genre)

    label_uniq_ids, label_row_ids = np.unique(all_labels,   
    (len(sound_files), genre))
    label_row_ids = label_row_ids.astype(np.int32, copy=False)
    onehot_labels = to_categorical(label_row_ids, 
    len(label_uniq_ids))

    return np.stack(all_features), onehot_labels


features, labels = generate_features_and_labels()

print(np.shape(features))
print(np.shape(labels))

training_split = 0.8

alldata = np.column_stack((features, labels))

np.random.shuffle(alldata)
splitidx = int(len(alldata) * training_split)
train, test = alldata[:splitidx,:], alldata[splitidx:,:]

print(np.shape(train))
print(np.shape(test))

train_input = test[:,:-10]
train_labels = train[:,-10:]

test_input = test[:,:-10]
test_labels = test[:,-10:]

print(np.shape(train_input))
print(np.shape(train_labels))

model = Sequential([
    Dense(100, input_dim=np.shape(train_input)[1]),
    Activation('relu'),
    Dense(10),
    Activation('softmax'),
    ])


model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
print(model.summary())

model.fit(train_input, train_labels, epochs=10, batch_size=32,
          validation_split=0.2) 
loss, acc = model.evaluate(test_input, test_labels, batch_size=32)

print('Done!')
print('Loss: %.4f, accuracy: %.4f' % (loss, acc))

I then received the expected output of:

Using TensorFlow backend.
Processing 100 songs in blues genre...
Processing 100 songs in classical genre...
Processing 100 songs in country genre...
Processing 100 songs in disco genre...
Processing 100 songs in hiphop genre...
Processing 100 songs in jazz genre...
Processing 100 songs in metal genre...
Processing 100 songs in pop genre...
Processing 100 songs in reggae genre...
Processing 100 songs in rock genre...
(1000, 25000)
(1000, 10)
(800, 25010)
(200, 25010)
(200, 25000)
(800, 10)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 100)               2500100   
_________________________________________________________________
activation_1 (Activation)    (None, 100)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 10)                1010      
_________________________________________________________________
activation_2 (Activation)    (None, 10)                0         
=================================================================
Total params: 2,501,110
Trainable params: 2,501,110
Non-trainable params: 0

None

After which I received this error message:

   Traceback (most recent call last):
  File "/Users/surengrigorian/Documents/Stage1.py", line 88, in <module>
validation_split=0.2)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/keras/engine/training.py", line 952, in fit
batch_size=batch_size)
  File   "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/keras/engine/training.py", line 804, in _standardize_user_data
check_array_length_consistency(x, y, sample_weights)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/keras/engine/training_utils.py", line 237, in check_array_length_consistency
    'and ' + str(list(set_y)[0]) + ' target samples.')
ValueError: Input arrays should have the same number of samples as      target arrays. Found 200 input samples and 800 target samples.

The article says this about the section:

Overall, you have about 2.5 million parameters or weights. Next, run the fit. It takes the training input and training labels and takes the number of epochs that you want. You want 10, so that’s 10 repeats over the trained input. It takes a batch size that tells you the number, in this case, songs to go through before updating the weights; and a validation_split of 0.2 says to take 20% of that trained input, split it out, don’t actually train on that and use that to evaluate how well it’s doing after every epoch. It never actually trains on the validation split, but the validation split lets you look at the progress as it goes.

Thank you for any assistance you can provide.

Suren Grig
  • 75
  • 1
  • 1
  • 11

1 Answers1

0

Better use split methods from scikit

from sklearn.model_selection import train_test_split, StratifiedShuffleSplit, StratifiedKFold

# apply Scikit stratified sampling
sss = StratifiedShuffleSplit(n_splits=1, test_size=0.20, random_state=random_state)
for train_index, test_index in sss.split(X, y):
  X_train, X_test = X[train_index], X[test_index]
  y_train, y_test = y[train_index], y[test_index]
Samer Ayoub
  • 981
  • 9
  • 10
  • Where would I place this section of code? Thank you. – Suren Grig Jan 21 '19 at 04:54
  • replace these 3 lines: np.random.shuffle(alldata) splitidx = int(len(alldata) * training_split) train, test = alldata[:splitidx,:], alldata[splitidx:,:] , remove the import to top, and try alternating between the 3 methods train_test_split, StratifiedShuffleSplit, StratifiedKFold .. to see what suits you – Samer Ayoub Jan 21 '19 at 05:02
  • Hello. I appreciate your assistance. However, the Python terminal gave an error message along the lines of: Traceback (most recent call last): File "/Users/surengrigorian/Documents/Stage1.py", line 59, in sss = StratifiedShuffleSplit(n_splits=1, test_size=0.20, random_state=random_state) NameError: name 'random_state' is not defined – Suren Grig Jan 23 '19 at 02:18
  • random_state = 37 or any number ... for the seeding – Samer Ayoub Jan 23 '19 at 05:52