5

I am taking machine learning class in courseera. The machine learning is a pretty area for me. In first programming exercise I am having some difficulties in gradient decent algorithm. If anyone can help me I will be appreciate.

Here is the instructions for updating thetas;

"You will implement gradient descent in the file gradientDescent.m. The loop structure has been written for you, and you only need to supply the updates to θ within each iteration.

    function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
    %GRADIENTDESCENT Performs gradient descent to learn theta
    %   theta = GRADIENTDESENT(X, y, theta, alpha, num_iters) updates theta by 
    %   taking num_iters gradient steps with learning rate alpha

   % Initialize some useful values
   m = length(y); % number of training examples
   J_history = zeros(num_iters, 1);

   for iter = 1:num_iters

% ====================== YOUR CODE HERE ======================
% Instructions: Perform a single gradient step on the parameter vector
%               theta. 
%
% Hint: While debugging, it can be useful to print out the values
%       of the cost function (computeCost) and gradient here.
%
    % ============================================================

% Save the cost J in every iteration    
J_history(iter) = computeCost(X, y, theta);

end

end

So here is what I did to update thetas simultaneously;

    temp0 = theta(1,1) - (alpha/m)*sum((X*theta-y));
    temp1 = theta(2,1) - (alpha/m)*sum((X*theta-y).*X);
    theta(1,1) = temp0;
    theta(2,1) = temp1;

I am getting error when I run this code. Can anyone help me please?

Ram
  • 359
  • 1
  • 6
  • 15
  • What error are you getting? Matlab's error messages are usually quite helpful. – David Jun 01 '14 at 22:16
  • here is the error that I got; Error using .* Matrix dimensions must agree. Error in gradientDescent (line 20) temp1 = theta(2,1) - (alpha/m)*sum((X*theta-y).*X); – Ram Jun 01 '14 at 22:28

5 Answers5

20

I have explained why you can use the vectorized form:

theta = theta - (alpha/m) * (X' * (X * theta - y)); or the equivalent

theta = theta - (alpha/m) * ((X * theta - y)' * X)';

in this answer.

Quoting it below:


Explanation for the matrix version of gradient descent algorithm:

This is the gradient descent algorithm to fine tune the value of θ: enter image description here

Assume that the following values of X, y and θ are given:

  • m = number of training examples
  • n = number of features + 1

enter image description here

Here

  • m = 5 (training examples)
  • n = 4 (features+1)
  • X = m x n matrix
  • y = m x 1 vector matrix
  • θ = n x 1 vector matrix
  • xi is the ith training example
  • xj is the jth feature in a given training example

Further,

  • h(x) = ([X] * [θ]) (m x 1 matrix of predicted values for our training set)
  • h(x)-y = ([X] * [θ] - [y]) (m x 1 matrix of Errors in our predictions)

whole objective of machine learning is to minimize Errors in predictions. Based on the above corollary, our Errors matrix is m x 1 vector matrix as follows:

enter image description here

To calculate new value of θj, we have to get a summation of all errors (m rows) multiplied by jth feature value of the training set X. That is, take all the values in E, individually multiply them with jth feature of the corresponding training example, and add them all together. This will help us in getting the new (and hopefully better) value of θj. Repeat this process for all j or the number of features. In matrix form, this can be written as:

enter image description here

This can be simplified as: enter image description here

  • [E]' x [X] will give us a row vector matrix, since E' is 1 x m matrix and X is m x n matrix. But we are interested in getting a column matrix, hence we transpose the resultant matrix.

More succinctly, it can be written as: enter image description here

The same result can also be written as: enter image description here

Community
  • 1
  • 1
jerrymouse
  • 16,964
  • 16
  • 76
  • 97
6
theta = theta - (alpha/m) * (X' * (X * theta - y));

this is the right answer

4

The error that you got Error using .* Matrix dimensions must agree. Error in gradientDescent (line 20) temp1 = theta(2,1) - (alpha/m)*sum((X*theta-y).*X); means that the .* is not working. So, before that line, add in the following code:

size(X*theta-y)
size(X)

If you want to do (X*theta-y).*X, then both X*theta-y and X should be the same size. If they aren't, you will need to check your algorithm.

David
  • 8,449
  • 1
  • 22
  • 32
  • I understood the point. After some research i found someone used the code below theta = theta - (alpha/m * (X * theta-y)' * X)'; as result of this code, answer is [-3.6303 1.1664]. However, it seems not correct, because J value is not decreasing in each steps. Also, theta still seems [0,0] which is initial values of thetas . Do you have any idea how should I type the code to update theta simultaneously. – Ram Jun 02 '14 at 09:26
  • 1
    @Ram I did this course some time ago and I totally think the problem is the * multiplication. try your same code using the .* – Ander Biguri Jun 02 '14 at 10:23
  • Thanks a lot. it works now. Now, I need to understand what exactly meaning of vectorization, so i will open another title. Anyways, thank you again. – Ram Jun 02 '14 at 14:08
4
temp0 = theta(1,1) - (alpha/m)*sum((X*theta-y));
temp1 = theta(2,1) - (alpha/m)*sum((X*theta-y).*X(:,2));
theta(1,1) = temp0;
theta(2,1) = temp1;

Or you can use the code below. It's simpler. There are only two parameters theta1 and theta2. But if more parameters exist, it's much better.

for i=1:2
    theta(i) = theta(i) - (alpha/m)*sum((X*theta-y).*X(:,i));
end
Mars.
  • 79
  • 2
  • 5
  • I think the second option does not work since you are not updating simultaneously all values of theta – ant1 May 17 '20 at 16:29
2

There is one thing to note in this question:

X = [ones(m, 1), data(:,1)]; 

so

theta = theta - (alpha / m) * (X' * (X * theta - y));

and

temp0 = theta(1, 1) - (alpha / m) * sum((X * theta - y));
temp1 = theta(2, 1) - (alpha / m) * sum((X * theta - y) .* X(:, 2));
theta(1, 1) = temp0;
theta(2, 1) = temp1;

both are right