1

First off my assumptions might be wrong:

  1. Loss is how far from the correct answer each training example is (then divided by the number of examples - kind of a mean loss).
  2. Accuracy is how many training examples are correct (if the highest output is taken as the correct answer then it doesn't matter if it's 0.7 which would give a loss of 0.3, it still outputs the correct answer). This is given as a percentage.

To my eye that means that accuracy will typically be closer to 100% than loss will be to 0. This is not what I'm seeing:

10000/10000 [==============================] - 1067s - loss: 0.0408 - acc: 0.9577 - val_loss: 0.0029 - val_acc: 0.9995
Epoch 2/5
10000/10000 [==============================] - 991s - loss: 0.0021 - acc: 0.9997 - val_loss: 1.9070e-07 - val_acc: 1.0000
Epoch 3/5
10000/10000 [==============================] - 990s - loss: 0.0011 - acc: 0.4531 - val_loss: 1.1921e-07 - val_acc: 0.2440

That's on 3 epochs, the second attempt at getting this working. This is with the train_dategen having shuffle=True. I have results with shuffle=False (I initially thought this might be the issue), here:

10000/10000 [==============================] - 1168s - loss: 0.0079 - acc: 0.9975 - val_loss: 0.0031 - val_acc: 0.9995
Epoch 2/5
10000/10000 [==============================] - 1053s - loss: 0.0032 - acc: 0.9614 - val_loss: 1.1921e-07 - val_acc: 0.2439
Epoch 3/5
10000/10000 [==============================] - 1029s - loss: 1.1921e-07 - acc: 0.2443 - val_loss: 1.1921e-07 - val_acc: 0.2438
Epoch 4/5
10000/10000 [==============================] - 1017s - loss: 1.1921e-07 - acc: 0.2439 - val_loss: 1.1921e-07 - val_acc: 0.2438
Epoch 5/5
10000/10000 [==============================] - 1041s - loss: 1.1921e-07 - acc: 0.2445 - val_loss: 1.1921e-07 - val_acc: 0.2435

I use categorical_crossentropy for loss, since I have 3 classes. I have more data than needed (about 178,000 images, all classified into 1 of 3 classes).

Am I misunderstanding something, or has something gone wrong?

Here's my full code:

# Importing the Keras libraries and packages
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense

# Initialising the CNN
classifier = Sequential()
# Step 1 - Convolution
classifier.add(Conv2D(32, (3, 3), input_shape = (200, 200, 3), activation = 'relu'))
# Step 2 - Pooling
classifier.add(MaxPooling2D(pool_size = (2, 2)))
# Adding a second convolutional layer
classifier.add(Conv2D(32, (3, 3), activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))
# Step 3 - Flattening
classifier.add(Flatten())
# Step 4 - Full connection
classifier.add(Dense(units = 128, activation = 'relu'))
classifier.add(Dense(units = 3, activation = 'sigmoid'))
# Compiling the CNN
classifier.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])
# Part 2 - Fitting the CNN to the images
from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(rescale = 1./255)
test_datagen = ImageDataGenerator(rescale = 1./255)
training_set = train_datagen.flow_from_directory('dataset/training_set',
                target_size = (200, 200),
                batch_size = 64,
                class_mode = 'categorical',
                shuffle=True)

test_set = test_datagen.flow_from_directory('dataset/test_set',
                target_size = (200, 200),
                batch_size = 62,
                class_mode = 'categorical',
                shuffle=True)

classifier.fit_generator(training_set,
                steps_per_epoch = 10000,
                epochs = 5,
                validation_data = test_set,
                validation_steps=1000)

classifier.save("CSGOHeads.h5")
# Part 3 - Making new predictions
import numpy as np
from keras.preprocessing import image
test_image = image.load_img('dataset/single_prediction/1.bmp', target_size = (200, 200))
test_image = image.img_to_array(test_image)
test_image = np.expand_dims(test_image, axis = 0)
result = classifier.predict(test_image)
training_set.class_indices
if result[0][0] == 1:
    prediction = 'head'
else:
    prediction = 'not'
today
  • 32,602
  • 8
  • 95
  • 115
FraserOfSmeg
  • 1,128
  • 2
  • 23
  • 41
  • Could you please post your code as well? Obviously, during training process (assuming a proper setup) both validation and training loss values decrease and accuracy increase until reaching a plateau (and if you continue training, maybe after a while the validation loss value starts increasing and validation accuracy drops which is called over-fitting). But in your case, the accuracy is high at the very beginning which is very strange and a possible sign of a bug or mistake in your code. – today Jun 18 '18 at 20:20
  • @today I've posted my entire code, let me know if anything is unclear (I'm still new-ish to python so it may look a complete mess). Thanks – FraserOfSmeg Jun 18 '18 at 20:44

2 Answers2

4

Since you are classifying images into one of 3 classes (i.e. which is called single-label multi-class classification: there are multiple classes but each image has only one label) you should use softmax as the activation function of last layer instead of using sigmoid:

classifier.add(Dense(units = 3, activation = 'softmax')) # don't use sigmoid here

If you want me to explain more, let me know and I will update my answer.

today
  • 32,602
  • 8
  • 95
  • 115
3

To complement @today's answer, if the last layer's activation is sigmoid, the loss should be binary_crossentropy. It's a recipe for multi-label classification problems. Otherwise, for one-label classification, use softmax plus categorical_crossentropy. Do not mix up sigmoid with categorical_crossentropy.

neurite
  • 2,798
  • 20
  • 32