When gradient descent quantitatively suggests by much the biases and weights to be reduced, what does learning rate is doing?? Am a beginner, someone please enlighten me on this.
Asked
Active
Viewed 71 times
-2
-
1Ask this question at https://stats.stackexchange.com/. – nbro Sep 18 '18 at 09:04
-
Welcome to SO; please do take some time to read [What topics can I ask about here?](https://stackoverflow.com/help/on-topic) – desertnaut Sep 18 '18 at 09:34
1 Answers
0
Learning rate is a hyper-parameter that controls how much we are adjusting the weights of our network with respect the loss gradient. The lower the value, the slower we travel along the downward slope. While this might be a good idea (using a low learning rate) in terms of making sure that we do not miss any local minima, it could also mean that we’ll be taking a long time to converge — especially if we get stuck on a plateau region.
new_weight = existing_weight — learning_rate * gradient
If learning rate is too small gradient descent can be slow
If learning rate is fast gradient descent can overshoot the minimum.It may fail to converge, it may even diverge

Bhaskar
- 333
- 2
- 12