-1

Andrew Ng's course in Coursera, which Stanford's Machine Learning course, features programming assignments that deal with implementing the algorithms taught in class. The goal of this assignment is to implement linear regression through gradient descent with an input set of X, y, theta, alpha (learning rate), and number of iterations.

I implemented this solution in Octave, the prescribed language in the course.

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)

m = length(y); 
J_history = zeros(num_iters, 1);

numJ = size(theta, 1);

for iter = 1:num_iters

    for i = 1:m

        for j = 1:numJ

            temp = theta(j) - alpha /m *  X(i, j) * (((X * theta)(i, 1)) - y(i, 1));

            theta(j) = temp

        end

        prediction = X * theta;

J_history(iter, 1) = computeCost(X,y,theta) 

 end   

end


On the other hand, here is the cost function:

function J = computeCost(X, y, theta)

m = length(y); 

J = 0;

prediction = X * theta;

error = (prediction - y).^2;

J = 1/(2 * m) .* sum(error);

end

This does not pass the submit() function. The submit() function simply validates the data through passing an unknown test case.

I have checked other questions on StackOverflow but I really don't get it. :)

Thank you very much!

  • I didn't downvote, but I agree with them. Without any context, it is difficult to help you. We don't know what data the `submit()` function gives to it. If you have some toy data on which the code produces unsatisfactory/erroneous results, then please provide or at least describe it. More importantly, are your gradients correct? You don't say what the cost function is. We'll probably be able to guess from looking at the gradient, but I personally wont take the time to do that. Also, just a minor thing, you don't need the `temp` variable, you can just do `your_theta-=delta_theta`. – Ash May 26 '18 at 12:46
  • Also looking at the code rapidly, it's usually a bad idea to index non-cell vector shaped variables with `theta(j)` instead of `theta(1,j)` or `theta(j,1)`. Also, have you tried playing witht the step size? Finally, I might be mistaken but I wouldn't normalize the gradient by `m` if I were you, it doesn't seem to make sense here. – Ash May 26 '18 at 12:50
  • Sir @Ash could you please give feedback! I really appreciate your help with my updated question :) –  May 26 '18 at 13:18
  • Oh... Please don't call me sir, I'm not that old yet! :) – Ash May 26 '18 at 15:25
  • Nevertheless, @Ash, thanks so much! :) –  May 26 '18 at 22:21

2 Answers2

2

Your gradient seems to be correct and as already pointed out in the answer given by @Kasinath P, it is likely that the problem is that the code is too slow. You just need to vectorize it. In Matlab/Octave, you usually need to avoid for loops (note that although you have parfor in Matlab, it is not yet available in octave). So it is always better, performance-wise, to write something like A*x instead of iterating over each row of A with a for loop. You can read about vectorization here.

If I understand correctly, X is a matrix of size m*numJ where m is the number of examples, and numJ is the number of features (or the dimension of the space where each point lies. In that case, you can rewrite your cost function as

(1/(2*m)) * (X*theta-y)'*(X*theta-y);%since ||v||_2^2=v'*v for any vector v in Euclidean space 

Now, we know from basic matrix calculus that for any two vectors s and v that are functions from R^{num_J} to R^m, the Jacobian of s^{t}v is given by

s^{t}Jacobian(v)+v^{t}*Jacobian(s) %this Jacobian will have size 1*num_J.

Applying that to your cost function, we obtain

jacobian=(1/m)*(theta'*X'-y')*X;

So if you just replace

for i = 1:m
    for j = 1:numJ
        %%% theta(j) updates
    end
end

with

%note that the gradient is the transpose of the Jacobian we've computed 
theta-=alpha*(1/m)*X'*(X*theta-y)

you should see a great increase in performance.

Ash
  • 4,611
  • 6
  • 27
  • 41
1

your computecost code is correct and Better follow the vectorized implementation of Gradient Descent. You are just iterating and it is slow and may have error.

That course aims you to do vectorized implementation as it is simple and handy at the same time. I knew this because I did that after sweating a lot. Vectorization is good:)

  • Good observation, +1. However, I think that you can greatly improve the quality of your answer if you add more detail and remove clutter (I don't mean it in a rude way) such as "vectorization is good". Just an observation though. – Ash May 26 '18 at 15:35