CNN seems to be overfitting

Question

I'm trying to build a CNN capable of detecting COVID-19 through chest x rays. I'm using this kaggle dataset. It has, more or less, 27k images, I'm only using COVID and NORMAL ones.

I first started following keras image classification tutorial, and after some twerks I have something like that:

batch_size = 16
img_height = 160
img_width = 160
img_size = (img_height, img_width)

seed_train_validation = 1
shuffle_value = True
validation_split = 0.3

train_ds = tf.keras.utils.image_dataset_from_directory(
    data_dir,
    image_size = img_size,
    validation_split = validation_split,
    subset = "training",
    seed = seed_train_validation,
    color_mode = "grayscale",
    shuffle = shuffle_value
)

val_ds = tf.keras.utils.image_dataset_from_directory(
    data_dir,
    image_size = img_size,
    validation_split = validation_split,
    subset = "validation",
    seed = seed_train_validation,
    color_mode = "grayscale",
    shuffle = shuffle_value
)

val_batches = tf.data.experimental.cardinality(val_ds)
test_ds = val_ds.take((2*val_batches) // 3)
val_ds = val_ds.skip((2*val_batches) // 3)

AUTOTUNE = tf.data.AUTOTUNE
train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)

resize_and_rescale = tf.keras.Sequential([
  layers.Resizing(img_height, img_width),
  layers.Rescaling(1./255)
])
data_augmentation = tf.keras.Sequential([
  layers.RandomFlip("horizontal_and_vertical"),
  layers.RandomRotation(0.2),
  layers.RandomZoom(0.1)
])

num_classes = len(class_names)

model_1 = Sequential([
    resize_and_rescale,
    data_augmentation,
    layers.Conv2D(16, 3, padding='same', activation='relu'),
    layers.MaxPooling2D(),
    layers.Conv2D(32, 3, padding='same', activation='relu'),
    layers.MaxPooling2D(),
    layers.Conv2D(64, 3, padding='same', activation='relu'),
    layers.MaxPooling2D(),
    layers.Dropout(0.2),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dense(num_classes)
])
model_1.compile(optimizer="adam",
                loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                metrics=['accuracy'])
epochs = 75
history = model_1.fit(
    train_ds,
    validation_data = val_ds,
    epochs = epochs
)

If I train for less epochs, let's say 10, when I plot accuracy and loss graps, I got a good exponential curve, howerver, if I increase the number of epochs, I got some weird graphs like these below:
Resultf after training for 75 epochs

I have already introduced data augmentation and a dropout layer, but I dont get better results no matter what. Any tips?

It seems that my model is overfitting, but I dont have much experience to conclude that for sure. However, I read that adding data augmentation and a dropout layer seems to work for most people, but that doesnt seem to be my case.

1) `model_1.summary()` shows several million parameters in the last layer. You might want to experiment with adding `strides=(2, 2)` to the first convolution layer or two to reduce the number of inputs at the point where you're flattening. 2) You should look into `SpatialDropout2D`, which is more effective at regularizing CNNs, because neighboring activations tend to be well correlated, so it is better to drop an entire feature map at once. — Nick ODell, Jul 30 '23 at 22:54
@NickODell for some reason, I can't run summary method on my model, it says that I first need to build it. Also, I'll add the strides and change Dropout to SpatialDroupout2D and compare the differences, thanks. — Vinicius Cavalcante, Jul 31 '23 at 02:23

score 0 · Answer 1 · answered Aug 02 '23 at 03:03

After some more iterations, I guess I figured out the main problem.

My dataset directory structure was something like this:

MainDir:

Class 0: 1.1. Images; 1.2. Masks.
Class 1: 2.1. Images; 2.2. Masks.

So, when I ran the image_from_directory method, it was gathering all images, masks included, and feeding it to my model, thus it was pretty hard to find patterns betweens images and masks. After removing masks entirely, my model seems to be working just fine.

CNN seems to be overfitting

1 Answers1