My CNN model kept getting high accuracy/low loss during training and much lower accuracy/higher loss during validation, therefore I started suspecting that it's overfitting.
I have therefore introduced a few dropout layers as well as some image augmentiation. I've also tried monitoring val_loss after each epoch, using ReduceLROnPlateau and EarlyStopping.
Although those measures helped improve validation accuracy a bit, I'm still nowhere close to the desired result and I'm honestly running out of ideas. This is the result I'm obtaining right now:
Epoch 9/30
999/1000 [============================>.] - ETA: 0s - loss: 0.0072 - accuracy: 0.9980
Epoch 9: ReduceLROnPlateau reducing learning rate to 1.500000071246177e-05.
1000/1000 [==============================] - 19s 19ms/step - loss: 0.0072 - accuracy: 0.9980 - val_loss: 2.2994 - val_accuracy: 0.6570 - lr: 1.5000e-04
Epoch 10/30
1000/1000 [==============================] - 19s 19ms/step - loss: 0.0045 - accuracy: 0.9985 - val_loss: 2.2451 - val_accuracy: 0.6560 - lr: 1.5000e-05
Epoch 11/30
1000/1000 [==============================] - 19s 19ms/step - loss: 0.0026 - accuracy: 0.9995 - val_loss: 2.6080 - val_accuracy: 0.6540 - lr: 1.5000e-05
Epoch 12/30
1000/1000 [==============================] - 19s 19ms/step - loss: 0.0018 - accuracy: 1.0000 - val_loss: 2.8192 - val_accuracy: 0.6560 - lr: 1.5000e-05
Epoch 13/30
1000/1000 [==============================] - 19s 19ms/step - loss: 0.0013 - accuracy: 1.0000 - val_loss: 2.8216 - val_accuracy: 0.6570 - lr: 1.5000e-05
32/32 [==============================] - 1s 23ms/step - loss: 2.8216 - accuracy: 0.6570
Am I wrong to assume that overfitting is still the problem that prevents my model from scoring high on validation and test data?
Or is there something fundamentally wrong with my architecture?
#prevent overfitting, generalize better
data_augmentation = tf.keras.Sequential([
layers.RandomFlip("horizontal_and_vertical"),
layers.RandomRotation(0.2),
layers.RandomZoom((0.2))
])
model = tf.keras.models.Sequential()
model.add(data_augmentation)
#same padding, since edges of the pictures often contain valuable information
model.add(layers.Conv2D(64, (3,3), strides=(1,1), padding='same', activation = 'relu', input_shape=(64,64,3)))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Dropout(0.25))
model.add(layers.Conv2D(32, (3,3), strides=(1,1), padding='same', activation = 'relu'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Dropout(0.25))
model.add(layers.Flatten())
model.add(layers.Dense(128, activation='relu'))
#prevent overfitting
model.add(layers.Dropout(0.25))
#4 output classes, softmax since we want to end up with probabilities for each class at the end (have to sum up to 1)
model.add(layers.Dense((4), activation='softmax'))
#not using one hot encoding, therefore sparse categorical entropy
model.compile(loss='sparse_categorical_crossentropy', optimizer=keras.optimizers.Adam(learning_rate=0.00015), metrics='accuracy')