Improving the Accuracy of a Keras Sound classification CNN

Question

I'm trying to build a convolutional network with keras (Theano Backend), but I can't get any accuracy above 33% when training with three classes. I'd appreciate it if someone could take a look at the code and help me improve the accuracy.

import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
import numpy as np
import os
from matplotlib.image import imread
import pandas as pd
print("Imports Successful")
train_size = 3000
test_size = 750

batch_size = 10
num_classes = 3
epochs = 5

img_rows, img_cols = 1170, 580

count = 0
TrainArray = []
for file in os.listdir("C:/Users/aeshon/Desktop/Data/DataAndLabels/CroppedTrainData"):
    if len(TrainArray) >= 2000:
      count = 0
      while count < 1000:
        img = imread("C:/Users/aeshon/Desktop/Data/DataAndLabels/CroppedTrainData/Road " + str(count) + ".jpg")
        #print("Hi"+str(count))
        new_img = img[:,:,0]
        TrainArray.append(new_img)
        count = count + 1
      break

    if len(TrainArray) >= 1000:
      count = 0
      while count < 1000:
        img = imread("C:/Users/aeshon/Desktop/Data/DataAndLabels/CroppedTrainData/Water " + str(count) + ".jpg")
        #print("Hi"+str(count))
        new_img = img[:,:,0]
        TrainArray.append(new_img)
        count = count + 1
      continue

    img = imread("C:/Users/aeshon/Desktop/Data/DataAndLabels/CroppedTrainData/Gravel " + str(count) + ".jpg")
    #print("Hi"+str(count))
    new_img = img[:,:,0]
    TrainArray.append(new_img)
    count = count + 1
print("Train Array Synthesis Complete")
x_train = np.asarray(TrainArray)
del TrainArray
print("Array Deleted!")
count = 0
TestArray = []
for file in os.listdir("C:/Users/aeshon/Desktop/Data/DataAndLabels/CroppedTestData"):
    if len(TestArray) >= 500:
      count = 0
      while count < 250:
        img = imread("C:/Users/aeshon/Desktop/Data/DataAndLabels/CroppedTestData/Road " + str(count) + ".jpg")
        #print("Hi"+str(count))
        new_img = img[:,:,0]
        TestArray.append(new_img)
        count = count + 1
      break

    if len(TestArray) >= 250:
      count = 0
      while count < 250:
        img = imread("C:/Users/aeshon/Desktop/Data/DataAndLabels/CroppedTestData/Water " + str(count) + ".jpg")
        #print("Hi"+str(count))
        new_img = img[:,:,0]
        TestArray.append(new_img)
        count = count + 1
      continue

    img = imread("C:/Users/aeshon/Desktop/Data/DataAndLabels/CroppedTestData/Gravel " + str(count) + ".jpg")
    #print("Hi"+str(count))
    new_img = img[:,:,0]
    TestArray.append(new_img)
    count = count + 1
print("Test Array Synthesis Complete")
x_test = np.asarray(TestArray)
del TestArray
print("Array Deleted!")

x_train = x_train.reshape(train_size, 1170, 580, 1)
x_test = x_test.reshape(test_size, 1170, 580, 1)

print("x_train shape:", x_train.shape)
#print(x_train.shape[0], 'train samples')
#print(x_test.shape[0], 'test samples')

TrainLabels = np.asarray(pd.read_csv('C:/Users/aeshon/Desktop/Data/DataAndLabels/TrainingLabelsCompressed.csv'))

for i in range(len(TrainLabels)):
    TrainLabels[i] = int(TrainLabels[i])

y_train = TrainLabels

TestLabels = np.asarray(pd.read_csv('C:/Users/aeshon/Desktop/Data/DataAndLabels/TestingLabelsCompressed.csv'))

for i in range(len(TestLabels)):
    TestLabels[i] = int(TestLabels[i])

y_test = TestLabels

y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
print("Label formatting complete!")

print('Data Configuration Successful! Moving on to model compilation')

model = Sequential()
model.add(Conv2D(16, kernel_size=(3,3), activation = 'relu', input_shape = (1170, 580, 1)))
model.add(Conv2D(32, (3,3), activation = 'relu'))
model.add(MaxPooling2D(pool_size = (2,2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(64, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation = 'softmax'))

model.compile(loss = keras.losses.categorical_crossentropy,
              optimizer = keras.optimizers.Adadelta(),
              metrics = ['accuracy'])

print("Model Compiled!")

model.fit(x_train, y_train, batch_size = batch_size, epochs = epochs,
          verbose = 1, validation_data = (x_test, y_test))

score = model.evaluate(x_test, y_test, verbose = 1)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

model_json = model.to_json()
with open("model.json", "w") as json_file:
    json_file.write(model_json)
model.save_weights("C:/Users/aeshon/Desktop/model.h5")
print("Saved model to disk")

Here you can see the training loop output(5 epochs)

Train on 3000 samples, validate on 750 samples
Epoch 1/5
3000/3000 [==============================] - 10321s 3s/step - loss: 10.5418 - acc: 0.3457 - val_loss: 10.7454 - val_acc: 0.3333
Epoch 2/5
3000/3000 [==============================] - 10165s 3s/step - loss: 10.7615 - acc: 0.3323 - val_loss: 10.7454 - val_acc: 0.3333
Epoch 3/5
3000/3000 [==============================] - 10256s 3s/step - loss: 10.5681 - acc: 0.3443 - val_loss: 10.7454 - val_acc: 0.3333
Epoch 4/5
3000/3000 [==============================] - 10591s 4s/step - loss: 10.8213 - acc: 0.3283 - val_loss: 10.7454 - val_acc: 0.3333
Epoch 5/5
3000/3000 [==============================] - 10750s 4s/step - loss: 10.7400 - acc: 0.3337 - val_loss: 10.7454 - val_acc: 0.3333
750/750 [==============================] - 367s 489ms/step
Test loss: 10.745396969795227
Test accuracy: 0.3333333333333333

I know this post is mostly code, but my question is somewhat open ended. I've look at other posts including this one , but I already use categorical cross entropy and softmax activation. Any comments help!

UPDATE:

I've made some fixes, but now the neural network has the same problem at 66.67%. Here'es a snapshot of the training output now:

 Train on 3000 samples, validate on 750 samples
Epoch 1/10
3000/3000 [==============================] - 1438s 479ms/step - loss: 5.4465 - acc: 0.6500 - val_loss: 5.3739 - val_acc: 0.6667
Epoch 2/10
3000/3000 [==============================] - 1432s 477ms/step - loss: 5.3735 - acc: 0.6667 - val_loss: 5.3730 - val_acc: 0.6667
Epoch 3/10
3000/3000 [==============================] - 1439s 480ms/step - loss: 5.3728 - acc: 0.6667 - val_loss: 5.3730 - val_acc: 0.6667
Epoch 4/10
3000/3000 [==============================] - 1470s 490ms/step - loss: 5.3728 - acc: 0.6667 - val_loss: 5.3729 - val_acc: 0.6667
Epoch 5/10
3000/3000 [==============================] - 1440s 480ms/step - loss: 5.3728 - acc: 0.6667 - val_loss: 5.3730 - val_acc: 0.6667
Epoch 6/10
3000/3000 [==============================] - 1435s 478ms/step - loss: 5.3727 - acc: 0.6667 - val_loss: 5.3728 - val_acc: 0.6667

Are there any other things that need fixing?

UPDATE:

I've tried adding a confusion matrix after the model.fit() function. Have I implemented this correctly?

Y_pred = model.predict_generator(x_test, test_size // batch_size+1)
y_pred = np.argmax(Y_pred, axis=1)
print('Confusion Matrix')
print(confusion_matrix(labels = [0,1,2], y_pred))
print('Classification Report')
target_names = ['Gravel', 'Road', 'Water']
print(classification_report(labels = [0,1,2], y_pred, target_names=target_names))

The usual culprit is the learning rate, try lowering it. Additionally, I don't have much experience with AdaDelta, so if it was my experiment, I'd start with either Adam or plain SGD. And finally, the model seems pretty small, so I'd perhaps be easier and more predictable to train it without dropout first. Btw, the last time I saw Keras code, the `model.compile()` was a very lazy function, so your `print("Model Compiled!")` is actually misleading, the actual compilation only happens when you start processing some data. — dedObed, Dec 01 '18 at 01:34
Thanks so much! I added your suggestions, but now the network is stuck at 66.67%. Should I just keep tuning the parameters, or is there another fix? — ab123, Dec 01 '18 at 04:32
I'm happy it helped. It's always hard to tell over the internet when I cannot play around it, ML is still much of alchemy ;-) Have a qualitative look at what this 66.6 percent means: is it two classes being always recognized correctly? If so, have a look at the images. Is the third class somehow (very) different? Also, it's quite suspicious that your loss/accuracy is basically the same for train / validation. Are you sure that the sets are disjoint? I will try to build an asnwer from this mess. — dedObed, Dec 01 '18 at 09:36
Thanks. I've taken a look at the images, and it looks like the "Road" class is very different from Gravel and Water. I also have separate datasets for train/test, which you can see as CroppedTrainData and CroppedTestData. — ab123, Dec 01 '18 at 18:17
And how is classification going? Could you produce confusion matrix? — dedObed, Dec 01 '18 at 18:18
See Wikipedia for the definition and example (https://en.wikipedia.org/wiki/Confusion_matrix). Calculation is pretty strightforward, for each sample collect its predicted class and then `confusion_matrix[label, prediction] += 1`. — dedObed, Dec 01 '18 at 18:24
I'm not familiar with this part of CNN's, so could you please put some code that I can add into the original program above? Thanks! — ab123, Dec 01 '18 at 18:26
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/184563/discussion-between-ab123-and-dedobed). — ab123, Dec 01 '18 at 18:34
I've moved our discussion to the chat, just letting you know — ab123, Dec 01 '18 at 18:35

Improving the Accuracy of a Keras Sound classification CNN

0 Answers0