Why models often benefit from reducing the learning rate during training

Question

In Keras official documentation for ReduceLROnPlateau class (https://keras.io/api/callbacks/reduce_lr_on_plateau/) they mention that

"Models often benefit from reducing the learning rate"

Why is that so? It's counter-intuitive for me at least, since from what I know- a higher learning rate allows taking further steps from my current position.

Thanks!

you might get more downvote cuz your question is not about code, i suggest you post it on https://stats.stackexchange.com/ — Yefet, Jan 24 '21 at 13:02
I didn't know that stack overflow is just for explicit code questions..10x. — 21kc, Jan 24 '21 at 16:19

score 1 · Accepted Answer · answered Jan 24 '21 at 11:01

1

Neither too high nor too low learning rate should be considered for training a NN. A large learning rate can miss the global minimum and in extreme cases can cause the model to diverge completely from the optimal solution. On the other hand, a small learning rate can stuck to a local minimum.

ReduceLROnPlateau purpose is to track your model's performance and reduce the learning rate when there is no improvement for x number of epochs. The intuition is that the model approached a sub-optimal solution with current learning rate and oscillate around the global minimum. Reducing the learning rate would enable the model to take smaller learning steps to the optimal solution of the cost function.

Image source

answered Jan 24 '21 at 11:01

Soc

46
4

Can you please add an edit in your answer about the following: If the image you added is a local minimum, but we have a better local minimum elsewhere, then the reduction might also cause us to stay in this location rather than "escape" and find the better minimum right? – 21kc Jan 24 '21 at 12:08
Yes, that's true, but remember that you can tune the parameter Patience in function ReduceLROnPlateau to define how many epochs you think is appropriate to let the current lr "escape" this local minimum. If the algorithm stuck in a local minimum with your initial lr, it is possible that is not the optimal one to begin with. Furthermore, you can experiment with cyclical learning rates that is designed to overcome these types of problems [paper link](https://arxiv.org/pdf/1506.01186.pdf). [Keras function](https://github.com/psklight/keras_one_cycle_clr) – Soc Jan 24 '21 at 12:46
@Soc WIth regards to cyclic learning rate, it can also allows to jump over saddle points right? While with `ReduceLROnPlateau` if we happen to be in a saddle point, we can't escape. – ado sar Aug 10 '23 at 12:09

Why models often benefit from reducing the learning rate during training

1 Answers1