0

I try to implement the Stochastic Gradient Descent Algorithm. The first solution works:

def gradientDescent(x,y,theta,alpha):
    xTrans = x.transpose()
    for i in range(0,99):
        hypothesis = np.dot(x,theta)
        loss = hypothesis - y
        gradient = np.dot(xTrans,loss)
        theta = theta - alpha * gradient

    return theta

This solution gives the right theta values but the following algorithm doesnt work:

def gradientDescent2(x,y,theta,alpha):
    xTrans = x.transpose();
    for i in range(0,99):
        hypothesis = np.dot(x[i],theta)
        loss = hypothesis - y[i]
        gradientThetaZero= loss * x[i][0]
        gradientThetaOne = loss * x[i][1]
    theta[0] = theta[0] - alpha * gradientThetaZero
    theta[1] = theta[1] - alpha * gradientThetaOne
    return theta

I don't understand why solution 2 does not work, basically it does the same like the first algorithm.

I use the following code to produce data:

def genData():
    x = np.random.rand(100,2)
    y = np.zeros(shape=100)
    for i in range(0, 100):
        x[i][0] = 1
        # our target variable
        e = np.random.uniform(-0.1,0.1,size=1)
        y[i] = np.sin(2*np.pi*x[i][1]) + e[0]
    return x,y

And use it the following way:

x,y = genData()
theta = np.ones(2)
theta = gradientDescent2(x,y,theta,0.005)
print(theta)

I hope you can help me! Best regards, Felix

Angus Williams
  • 2,284
  • 19
  • 21
Cyberlander
  • 145
  • 1
  • 1
  • 11
  • If you debug your code with print, shape, and break statements, you'll see why they aren't the same – Nick Becker Oct 24 '16 at 13:14
  • Just so I properly understand: solution 1 (the first code snippet that you posted) is *not* stochastic gradient descent, but solution 2 is your attempt to alter solution 1 so that it become stochastic gradient descent? Also, in your `genData` code, why are you setting `x[i][0]=1`, so that the entire first column of `x` is 1? – Angus Williams Oct 24 '16 at 14:51
  • The first column is 1 because x0 = 1 – Cyberlander Oct 24 '16 at 16:11

1 Answers1

2

Your second code example overwrites the gradient computation on each iteration over your observation data.

In the first code snippet, you properly adjust your parameters in each looping iteration based on the error (loss function).

In the second code snippet, you calculate the point-wise gradient computation in each iteration, but then don't do anything with it. That means that your final update effectively only trains on the very last data point.

If instead you accumulate the gradients within the loop by summing ( += ), it should be closer to what you're looking for (as an expression of the gradient of the loss function with respect to your parameters over the entire observation set).

T3am5hark
  • 856
  • 6
  • 9