Bad results from LMS stochastic gradient descent

Question

I'm trying to adapt a batch gradient descent algorithm from a previous question to do stochastic gradient descent, my cost seems to get stuck pretty far from the minimum value (in the example, around 1750 when the minimum is around 1450). It would seem like once it reaches that value, it just starts oscillating there. I also tried to shuffle range(0, x.shape[0]-1) every l but it didn't make any difference. I expect oscillations around the optimal value, but this just seemed too far off, so I think there must be a mistake.

import numpy as np

y = np.asfarray([[400], [330], [369], [232], [540]])
x = np.asfarray([[2104,3], [1600,3], [2400,3], [1416,2], [3000,4]])
x = np.concatenate((np.ones((5,1)), x), axis=1)
theta = np.asfarray([[0], [.5], [.5]])

fscale = np.sum(x, axis=0)
x /= fscale

alpha = .1

for l in range(1,100000):
    for i in range(0, x.shape[0]-1):
        h = np.dot(x, theta)
        gradient = ((h[i:i+1] - y[i:i+1]) * x[i:i+1]).T
        theta -= alpha * gradient
        print ((h - y)**2).sum(), theta.squeeze() / fscale

Tune the learning-rate and introduce a learning-rate decay. And yes, introduce randomness somehow (at least a shuffle; maybe sampling with replacement). On a side-note: that's not really nice code to work with for SO. Why would you not use some common terms as epoch, sample and co. And that sample-indexing is also ugly. ```h[i]``` would be enough. Maybe you wanted to make it general for mini-batches, but again: mini-batches should not be neighbors in some unshuflled array. — sascha, May 30 '17 at 19:34
I tried hardcoding some decay for testing purposes with the sample e.g. `if ((h - y)**2).sum() < 2000: alpha = .01` but that doesn't seem to do much good, just gets me stuck at a higher cost actually. I also tried `i = random.randrange(0, x.shape[0]-1)` without much success. — wizplum, May 31 '17 at 17:36
That learning-schedule you used is atypical. Just decrease after x epochs. It's also hard to reason about your random-sampling as the line you proposed not necessarily fit to the code. — sascha, May 31 '17 at 18:47

Bad results from LMS stochastic gradient descent

0 Answers0