Why do small tweaks to the momentum constant affect SGD results so greatly?

Asked Apr 23 '18 at 19:33

Active Apr 23 '18 at 19:33

Viewed 101 times

So I'm just playing around with Keras, and decided to make a simple neural network to do univariate linear regression. (epochs=25, lrate=0.001, decay=100). I notice that when I set momentum in [0.7, 0.9], the r^2 of my regression is always >0.95, but if I drop momentum below 0.7 then I suddenly start getting extremely poor results -- some simulations will return r^2 of 0.5, some with -2, or 0.1, etc. High variance.

Is there some intuition for why this might happen? I understand momentum is good for SGD, but I'm surprised to see such a stark drop-off in model quality as you slightly tweak the momentum constant...

asked Apr 23 '18 at 19:33

user49404

In my experience you should be getting similar results if your model is linear and a sufficient amount of epochs were completed. Can you edit your question adding how you defined your model and trained it? If you could make a small reproducible example of your problem, it would be best. – ldavid Apr 23 '18 at 20:25
The why in this case is not a programming question. – Dr. Snoopy Apr 24 '18 at 07:25

Why do small tweaks to the momentum constant affect SGD results so greatly?

0 Answers0