I'm trying to adapt a batch gradient descent algorithm from a previous question to do stochastic gradient descent, my cost seems to get stuck pretty far from the minimum value (in the example, around 1750 when the minimum is around 1450). It would seem like once it reaches that value, it just starts oscillating there. I also tried to shuffle range(0, x.shape[0]-1)
every l
but it didn't make any difference. I expect oscillations around the optimal value, but this just seemed too far off, so I think there must be a mistake.
import numpy as np
y = np.asfarray([[400], [330], [369], [232], [540]])
x = np.asfarray([[2104,3], [1600,3], [2400,3], [1416,2], [3000,4]])
x = np.concatenate((np.ones((5,1)), x), axis=1)
theta = np.asfarray([[0], [.5], [.5]])
fscale = np.sum(x, axis=0)
x /= fscale
alpha = .1
for l in range(1,100000):
for i in range(0, x.shape[0]-1):
h = np.dot(x, theta)
gradient = ((h[i:i+1] - y[i:i+1]) * x[i:i+1]).T
theta -= alpha * gradient
print ((h - y)**2).sum(), theta.squeeze() / fscale