4

Any idea about why our training loss is smooth and our validation loss is that noisy (see the link) across epochs? We are implementing a deep learning model for diabetic retinopathy detection (binary classification) using the data set of fundus photographs provided by this Kaggle competition. We are using Keras 2.0 with Tensorflow backend.

As the data set is too big to fit in memory, we are using fit_generator, with ImageDataGenerator randomly taking images from training and validation folders:

# TRAIN THE MODEL
model.fit_generator(
    train_generator,
    steps_per_epoch= train_generator.samples // training_batch_size,
    epochs=int(config['training']['epochs']),
    validation_data=validation_generator,
    validation_steps= validation_generator.samples // validation_batch_size,
    class_weight=None)

Our CNN architecture is VGG16 with dropout = 0.5 in the last two fully connected layers, batch normalization only before the first fully connected layer, and data augmentation (consisting on flipping the images horizontally and vertically). Our training and validation samples are normalized using the training set mean and standard deviation. Batch size is 32. Our activation is a sigmoid and the loss function is the binary_crossentropy. You can find our implementation in Github

It definitely has nothing to do with overfitting, as we tried with a highly regularized model and the behavior was quite the same. Is it related with the sampling from the validation set? Has any of you had a similar problem before?

Thanks!!

  • What is the training/testing ratio you chose for training your model? That is how much training and testing data you have? Also, does this noisy validation loss happens when you try several trainings? It would help if you provided all the parameters you used here (steps per epoch, epochs, etc) – DarkCygnus Oct 28 '17 at 04:49
  • The number of training samples is ~32.000, with around 20% being positive and 80% negative. I have exactly the same distribution in the 3000 validation samples. I've trained the model using different combination of regularization strategies (dropout, weight decay, batch normalization, augmentation, etc) but I always got the same noisy validation loss. Steps per epoch is equals to the number of training samples divided by the batch size (around 100 steps). – user2227561 Oct 30 '17 at 18:16
  • 1
    Did you find anything uselful? @user2227561 – Abhishek Singh Jun 21 '19 at 20:17

1 Answers1

0

I would look, in that order:

  • bug in validation_generator implementation (incl. steps - does it go through all pics reserved for validation?)
  • in validation_generator, do not use augmentation (reason: an augmentation might be bad, not learnable, and at train, it does achieve a good score only by hard-coding relationships which are not generalizable)
  • change train/val split to 50/50
  • calculate, via a custom callback, the validation loss at the end of the epoch (use the same function, but calling it with a callback produces different (more accurate, at certain, non-standard models) results)

If nothing of the above gives a more smooth validation loss curve, then my next assumption would be that this is the way it is, and I might need to work on the model architecture

tyrex
  • 8,208
  • 12
  • 43
  • 50