Vectorized gradient descent basics

Question

I'm implementing simple gradient descent in octave but its not working. Here is the data I'm using:

X = [1 2 3
     1 4 5
     1 6 7]

y = [10
     11
     12]

theta = [0
         0
         0]

alpha = 0.001 and itr = 50

This is my gradient descent implementation:

function theta = Gradient(X,y,theta,alpha,itr)
m= length(y)

for i = 1:itr,
    th1 = theta(1) - alpha * (1/m) * sum((X * theta - y) .* X(:, 1));
    th2 = theta(2) - alpha * (1/m) * sum((X * theta - y) .* X(:, 2));
    th3 = theta(3) - alpha * (1/m) * sum((X * theta - y) .* X(:, 3));

    theta(1) = th1;
    theta(2) = th2;
    theta(3) = th3;
end

Questions are:

It produces some values of theta which I use in theta * [1 2 3] and expect an output near about 10 (from y). Is that the correct way to test the hypothesis? [h(x) = theta' * x]
How can I determine how many times should it iterate? If I give it 1500 iterations, theta gets extremely small (in e).
If I use double digit numbers in X, theta gets too small again. Even with < 5 iterations.

I've been struggling with these things for a long time now. Unable to resolve it myself.

Sorry for bad formatting.

score 0 · Answer 1 · answered Jun 21 '14 at 13:09

Your Batch gradient descent implementation seems perfectly fine to me. Can you be more specific on what is the error you are facing. Having said that, for your question Is that the correct way to test the hypothesis? [h(x) = theta' * x]. Based on the dimensions of your test set here, you should test it as h(x) = X*theta. For your second question, the number of iterations depends on the data set provided. To decide on the optimized number of iterations, you need to plot your cost function with the number of iterations. And as iterations increase, values of cost function should decrease. By this you can decide upon how many iterations you need. You might also consider, increasing the value of alpha in steps of 0.001, 0.003, 0.01, 0.03, 0.1 ... to consider best possible alpha value. For your third question, I guess you are directly trying to model the data which you have in this question. This data is very small, it just contains 3 training examples. You might be trying to implement linear regression algorithm. For that, you need to that proper training set which contains sufficient data to train your model. Then you can test your model with your test data. Refer to Andrew Ng course of Machine Learning in www.coursera.org. You will find more information in that course.

1. I think `h(x) = theta' * X` will be same as `X * theta` as theta is transposed in the former. 2. Will plot cost function to find no. of iterations. 3. Why theta gets too small if I use double digit nos. in X in the same example? I'm not using a trained model on double digit data. Just finding new theta values for slightly bigger nos. — user3762027, Jun 22 '14 at 06:31
@user3762027 if you use theta' * X, then make sure the X matrix looks like [1 1 1......1; x11 x21....;and so on instead of [1 x11 x12....;1 x21 x22....;...] I hope you got it what i am trying to say — Sumit Kumar Saha, Feb 01 '16 at 14:40

Vectorized gradient descent basics

1 Answers1