Backpropagation Optimization: How do I use the derivatives for optimizing the weights and biases?

Question

Given the derivative of the cost function with respect to the weights or biases of the neurons of a neural network, how do I adjust these neurons to minimize the cost function? Do I just subtract the derivative multiplied by a constant off of the individual weight and bias? If constants are involved how do I know what is reasonable to pick?

(deleted the erroneous comment) Sorry, too quick on the trigger. `weight -= learning_rate * (delta * activation_strength)` — jorgenkg, Apr 09 '18 at 06:58
Maybe this code can nudge you in the right direction: https://github.com/jorgenkg/python-neural-network/tree/master/nimblenet/learning_algorithms/backpropagation. Disclaimer: I wrote it — jorgenkg, Apr 09 '18 at 06:59

dbep · Answer 1 · 2018-04-09T05:11:49.417

Your right about how to perform the update. This is what is done in gradient descent in its various forms. Learning rates (the constant you are referring to) are generally very small 1e-6 - 1e-8. There are numerous articles on the web covering both of these concepts.

In the interest of a direct answer though, it is good to start out with a small learning rate (on the order suggested above), and check that the loss is decreasing (via plotting). If the loss decreases, you can raise the learning rate a bit. I recommend to raise it by 3x its current value. For example, if it is 1e-6, raise it to 3e-6 and check again that your loss is still decreasing. Keep doing this until the loss is no longer decreasing nicely. This image should give some nice intuition on how learning rates affect the loss curve (image comes from Stanford's cs231n lecture series)

You want to raise the learning rate so that the model doesn't take as long to train. You don't want to raise the learning rate too much because then it is possible to overshoot the local minimum you're descending towards and for the loss to increase (the yellow curve above). This is an oversimplification because the loss landscape of a neural network is very non-convex, but this is the general intuition.

Backpropagation Optimization: How do I use the derivatives for optimizing the weights and biases?

1 Answers1