-3

I'm implementing gradient descent for an assignment and am confused about when the weights are suppose to stop updating. Do I stop updating the weights when they don't change very much, i.e. when the weighti - weightprevious i <= (some threshhold).

Also, with the way I'm currently implementing it above, Weight1 can be finished before Weight2. Is that right or should all the weights finish at the same time?

Aetos11
  • 57
  • 7

2 Answers2

0

For simple, you stop when the cost/loss is minimized.

You should distribute the gradient using partial derivative.

pinxue
  • 1,736
  • 12
  • 17
0

If you have access to the gradient, you can stop when the l2-norm of your gradient is below some threshold, if not, you can use your method on the l2-norm of the difference between your weights, usually in this case the threshold would not be absolute, but relative to ||weight_i||+small_delta. You might also find this link useful: https://math.stackexchange.com/questions/1618330/stopping-criteria-for-gradient-method Note that you need some assumptions on the nature of your function you are minimizing to guarantee minimization (existence of minimum, starting point in basin of attraction which is not a problem for strongly convex functions is but not true in general).

Juan Carlos Ramirez
  • 2,054
  • 1
  • 7
  • 22