So I'm just playing around with Keras, and decided to make a simple neural network to do univariate linear regression. (epochs=25, lrate=0.001, decay=100). I notice that when I set momentum in [0.7, 0.9], the r^2 of my regression is always >0.95, but if I drop momentum below 0.7 then I suddenly start getting extremely poor results -- some simulations will return r^2 of 0.5, some with -2, or 0.1, etc. High variance.
Is there some intuition for why this might happen? I understand momentum is good for SGD, but I'm surprised to see such a stark drop-off in model quality as you slightly tweak the momentum constant...