-1

I'm trying to fine tune VGG16. But sometimes I got a validation accuracy that is constant, sometimes it is fixed to 0.0 and sometimes it is fixed to 1.0 and it is the same also on the test accuracy. It also happened that the training is constant.

Those are some examples:

Adam, bs: 64, lr: 0.001

train_acc = [0.45828044, 0.4580425, 0.45812184, 0.45820114, 0.45820114, 0.45812184, 0.45820114, 0.45820114, 0.45820114, 0.4580425, 0.45820114, 0.45820114, 0.45812184, 0.45828044, 0.45820114, 0.45828044, 0.45812184, 0.45820114, 0.45812184, 0.45828044, 0.45820114, 0.45820114, 0.45812184, 0.45812184, 0.45820114, 0.45812184, 0.45828044, 0.45820114, 0.45828044, 0.45812184, 0.45820114, 0.45820114, 0.45812184, 0.45820114, 0.45820114, 0.45820114, 0.45828044, 0.45812184, 0.45828044, 0.4580425, 0.4580425, 0.45820114, 0.45820114, 0.45820114, 0.45828044, 0.45820114, 0.45812184, 0.45820114, 0.45820114, 0.45820114]
valid_acc = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
train_loss = [8.31718591143032, 8.35966631966799, 8.358442889857413, 8.357219463677575, 8.357219470939055, 8.358442853550015, 8.357219473359548, 8.357219434631658, 8.357219487882508, 8.359666328139717, 8.357219499984973, 8.357219495143987, 8.35844288017544, 8.355996039918232, 8.357219415267712, 8.355996025395273, 8.358442889857413, 8.357219521769412, 8.358442892277907, 8.355996052020698, 8.35721946609807, 8.357219415267712, 8.35844288017544, 8.358442885016427, 8.357219463677575, 8.358442882595934, 8.355996003610834, 8.357219458836589, 8.355996064123163, 8.357520040521766, 8.357219487882508, 8.357219480621028, 8.358442897118893, 8.357219495143987, 8.357219446734124, 8.35721945157511, 8.355996056861684, 8.358442911641852, 8.355996047179712, 8.359666311196264, 8.359666286991333, 8.35721946609807, 8.357219458836589, 8.35721944431363, 8.355996035077245, 8.357219453995603, 8.358442909221358, 8.357219439472644, 8.357219429790671, 8.357219461257083]
valid_loss = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

test_loss = 0.0
test_acc = 1.0

RMSprop, bs: 64, lr: 0.001

train_acc = [0.5421161, 0.54179883, 0.54179883, 0.54171956, 0.54171956, 0.5419575, 0.54187816, 0.54179883, 0.54187816, 0.5419575, 0.5419575]
valid_acc = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
train_loss = [6.990036433118249, 7.025707591003573, 7.025707559537161, 7.026923776278036, 7.02692376054483, 7.023275266444017, 7.024491474713166, 7.025707566798641, 7.024491443246754, 7.023275273705497, 7.0232752761259905]
valid_loss = [15.33323860168457, 15.33323860168457, 15.33323860168457, 15.33323860168457, 15.33323860168457, 15.33323860168457, 15.33323860168457, 15.33323860168457, 15.33323860168457, 15.33323860168457, 15.33323860168457]

test_loss = 15.33323860168457
test_acc = 0.0

SDG, bs: 64, lr: 0.01, momentum: 0.2

train_acc = [0.5406091, 0.5419575, 0.54187816, 0.54179883, 0.54187816, 0.54187816, 0.54187816, 0.54187816, 0.54179883, 0.54171956, 0.54179883]
valid_acc = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
train_loss = [6.990036433118249, 7.025707591003573, 7.025707559537161, 7.026923776278036, 7.02692376054483, 7.023275266444017, 7.024491474713166, 7.025707566798641, 7.024491443246754, 7.023275273705497, 7.0232752761259905]
valid_loss = [15.33323860168457, 15.33323860168457, 15.33323860168457, 15.33323860168457, 15.33323860168457, 15.33323860168457, 15.33323860168457, 15.33323860168457, 15.33323860168457, 15.33323860168457, 15.33323860168457]

test_loss = 15.33323860168457
test_acc = 0.0

SDG, bs: 64, lr: 0.01, momentum: 0.4

train_acc = [0.45740798, 0.45828044, 0.45820114, 0.45828044, 0.45820114, 0.4580425, 0.45820114, 0.45820114, 0.45820114, 0.45820114, 0.45820114]
valid_acc = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
train_loss = [8.329831461313413, 8.355996044759218, 8.357219475780042, 8.355996035077245, 8.357219502405467, 8.35966631603725, 8.357219461257083, 8.357219461257083, 8.357219456416097, 8.357219441893138, 8.357219478200534]
valid_loss = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

test_loss = 0.0
test_acc = 1.0

For the fine tuning I've used the following top layers:

model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

Do you have some idea of why this happen?

Anyway I'm still trying to train the network, but often the training accuracy increases and the validation accuracy behave in a very chaotic way, varying a lot from one epoch to another. Do you have some suggest, please?

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83

1 Answers1

0

Training accuracy increases and validation accuracy fluctuates are very obvious: the model is trying to learn how to "memorize" the training set, so we have validation set to prevent it from overfitting.

Also seeing from the result, your model seems to learn so low. Try tuning the hyperparameters.

A one thing that I notice (but cannot confirm): if you use transfer learning and the learning rate so big, it may destroy all the hard work of the pretrained model (in here, VGG). I found this learning rate scheduler from a Google's notebook, try using this:

start_lr = 0.00001
min_lr = 0.00001
max_lr = 0.00005 * tpu_strategy.num_replicas_in_sync
rampup_epochs = 5
sustain_epochs = 0
exp_decay = .8

def lrfn(epoch):
  if epoch < rampup_epochs:
    return (max_lr - start_lr)/rampup_epochs * epoch + start_lr
  elif epoch < rampup_epochs + sustain_epochs:
    return max_lr
  else:
    return (max_lr - min_lr) * exp_decay**(epoch-rampup_epochs-sustain_epochs) + min_lr
    
lr_callback = tf.keras.callbacks.LearningRateScheduler(lambda epoch: lrfn(epoch), verbose=True)
...
model.fit(..., callbacks=[lr_callback])

The idea is to set a low learning rate at the first epoch, then increase it and slowly decrease it.

Minh-Long Luu
  • 2,393
  • 1
  • 17
  • 39