-2

When gradient descent quantitatively suggests by much the biases and weights to be reduced, what does learning rate is doing?? Am a beginner, someone please enlighten me on this.

1 Answers1

0

Learning rate is a hyper-parameter that controls how much we are adjusting the weights of our network with respect the loss gradient. The lower the value, the slower we travel along the downward slope. While this might be a good idea (using a low learning rate) in terms of making sure that we do not miss any local minima, it could also mean that we’ll be taking a long time to converge — especially if we get stuck on a plateau region.

new_weight = existing_weight — learning_rate * gradient

If learning rate is too small gradient descent can be slow

If learning rate is fast gradient descent can overshoot the minimum.It may fail to converge, it may even diverge

Bhaskar
  • 333
  • 2
  • 12