Why are my train/valid set loss curves dropping and plateau after X epochs?

Question

I am training a deep model for MRI segmentation. The models I am using are U-Net++ and UNet3+. However, when plotting the validation and training losses of these models over time, I find that they all end with a sudden drop in loss, and a permanent plateau. Any ideas for what could be causing this plateau? or any ideas for how I could surpass it?

Here are the plots for the training and validation loss curves, and the corresponding segmentation performance (dice score) on the validation set. The drop in loss occurs at around epoch 80 and is pretty obvious in the graphs.

In regard to the things I've tried:

Perhaps a local minima is being found, which is hard to escape, so I tried resuming training at epoch 250 with the learning rate increased by a factor of 10, but the plateau stays the exact same regardless of how many epochs I keep training. I also tried resuming with a reduced LR of factor 10 and 100 and no change either.
Perhaps the model has too many parameters, i.e. the plateau is happening due to over-fitting. So I tried training models that have fewer parameters. This changed the actual loss value (Y-axis value) that the plateau ends up occurring at, but the same general shape of a sudden drop and plateau remains the same. I also tried increasing the parameters (because it was easy to do), and the same problem is observed.

Any ideas for what could be causing this plateau? or any ideas for how I could surpass it?

score 0 · Answer 1 · answered Feb 14 '22 at 16:14

Due to the high number of parameters it is hard if not impossible to reason about the optimization landscape, so any speculations are really just that, speculations.

If you assume that the model got stuck somewhere, that is, that the gradient is getting very small (it's sometimes worth plotting the distribution of the entries of the gradient over time too, or at least its magnitude), it sometimes is worth artificially forcing the optimizer to adapt, by changing the environment. One popular way to do so is using weight decay. For instance using a usual weight decay for SGD or if you're using Adam, switching to AdamW. Alternatives that are based on a similar idea are warm restarts.

Finally it might very well be possible that you reached the limits of what your model can achieve. A dice score in the neighbourhood of 0.9 is already quite good in many of todays segmentation tasks.

Why are my train/valid set loss curves dropping and plateau after X epochs?

1 Answers1