I have a question about updating the theta during the Stochastic GD. I have two ways to update theta:
1) Use the previous theta, to get all the hypotheses for all samples, and then update the theta by each sample. Like:
hypothese = np.dot(X, theta)
for i in range(0, m):
theta = theta + alpha * (y[i] - hypothese[i]) * X[i]
2) Another way: during the scan the samples, update the hypothese[i] using the latest theta. Like:
for i in range(0, m):
h = np.dot(X[i], theta)
theta = theta + alpha * (y[i] - h) * X[i]
I checked the SGD code, it seems the second way is correct. But during my coding, the first one will converge faster and the result is better than the second. Why the wrong way can perform better than the correct way?
I also attached the completed code as following:
def SGD_method1():
maxIter = 100 # max iterations
alpha = 1e4 # learning rate
m, n = np.shape(X) # X[m,n], m:#samples, n:#features
theta = np.zeros(n) # initial theta
for iter in range(0, maxIter):
hypothese = np.dot(X, theta) # update all the hypoes using the same theta
for i in range(0, m):
theta = theta + alpha * (y[i] - hypothese[i]) * X[i]
return theta
def SGD_method2():
maxIter = 100 # max iterations
alpha = 1e4 # learning rate
m, n = np.shape(X) # X[m,n], m:#samples, n:#features
theta = np.zeros(n) # initial theta
for iter in range(0, maxIter):
for i in range(0, m):
h = np.dot(X[i], theta) # update on hypo using the latest theta
theta = theta + alpha * (y[i] -h) * X[i]
return theta