1

I am testing a cGAN in keras / tensorflow, and after 1000 epochs I saved the model.

After a bit of time I restored

  1. the generator model + weights
  2. the discriminator model + weights
  3. the GAN weights (the model is recreated)

This is the resulting val_accuracy:

sudden drop

It is possible to see clearly that there is an immense drop in val_loss after restoring the model.

Could someone explain me why/what could have caused this situation ?

Stormsson
  • 1,391
  • 2
  • 16
  • 29
  • If you have used an optimizer with an adaptive learning rate, then it is fairly common to such a thing happens. The learning rate probably has decreased around 1000th epoch and as a result it helped the training process to escape the plateau/jumping around local minima. – today Jul 10 '18 at 14:43
  • not sure of this: the event at 1000th epoch was me restarting the machine. it would seem that after reloading the model something changed, but i don't understand what, because i saved and restored all weights. The only different thing was the state of the optimizer on the GAN; could that be the cause ? – Stormsson Jul 10 '18 at 15:19
  • I think the state of optimizer is saved as well when you save the Keras model and the change in the learning rate, as I said, is one of the possible explanations. But, if you have changed the optimizer or its parameters after loading the model, then this also could be the reason. – today Jul 10 '18 at 15:35
  • The graphics shows the validation loss, the not validation accuracy, doesn't it? – randhash Jul 10 '18 at 16:17
  • @critop i confirm that it is validation loss – Stormsson Jul 11 '18 at 06:11
  • The first time I saw this, I was doing a new train/val thus polluting my val dataset. – grabbag Aug 22 '20 at 15:43

1 Answers1

2

Further analysis might be required to prove this, but you might just unintentionally discovered a technique called "warm restarting". Simple said, you train your model with an annealing learning normally, stop, reset the learning rate and start over again. Intuitively you give the model oppurtunities to jump out of local minima and this might result in the observed behavior.

randhash
  • 463
  • 4
  • 15
  • i believe this is what could have happened; after reloading the model the LR was at the beginning value – Stormsson Jul 11 '18 at 08:27
  • Maybe you're the first who actually tried this with a GAN. This calls for a paper (-; – randhash Jul 11 '18 at 10:47
  • The val_loss can also increase upon reloading the model's weights and restarting training for the same reason. This can be particularly problematic since the checkpoint threshold is also reset to Inf, overwriting the prior best weights with drastically worse weights. – user3673 Jan 02 '20 at 20:40