Vectorization of a gradient descent code

Question

I am implementing a batch gradient descent on Matlab. I have a problem with the update step of theta. theta is a vector of two components (two rows). X is a matrix containing m rows (number of training samples) and n=2 columns (number of features). Y is an m rows vector.

During the update step, I need to set each theta(i) to

theta(i) = theta(i) - (alpha/m)*sum((X*theta-y).*X(:,i))

This can be done with a for loop, but I can't figure out how to vectorize it (because of the X(:,i) term).

Any suggestion?

If `X` has size m x 2, `theta` is 2 x 1 and `y` is m x 1, how is `X*theta` defined? How do you subtract `y` from that? And how do you multiply the result times the column vector `X(:,i)`? — Luis Mendo, Dec 23 '13 at 00:07
@LuisMendo if `X` has size mx2 `theta` is 2x1 then `X*theta` is mx1 and we can substract `y` (mx1). The multiplication by `X(:,i)` is a term by term multiplication (`.*`) — bigTree, Dec 23 '13 at 00:10

Mad Physicist · Accepted Answer · 2015-09-20T03:59:42.350

39

Looks like you are trying to do a simple matrix multiplication, the thing MATLAB is supposedly best at.

theta = theta - (alpha/m) * (X' * (X*theta-y));

edited Sep 20 '15 at 03:59

answered Dec 23 '13 at 00:12

Mad Physicist

107,652
25
181
264

@MadPhysicist works great thanks. By the way, I figured out this is not the good way of performing gradient descent for it doesn't update all the features simultaneously – bigTree Dec 23 '13 at 01:45
Isn't this only correct where `X = [ones(m, 1), data(:,1)]`? Because unlike `theta_1`, `theta_0` shouldn't be multiplied by `x^i` so `X` must contain a column of `1s`? – Quaker Feb 27 '16 at 17:25
Can this be further simplified as theta - (alpha/m)*(theta - A' * y) ? – XPD Sep 17 '20 at 04:15
1

@XPD only if `X'*X == I` – Mad Physicist Sep 17 '20 at 05:38

score 4 · Answer 2 · answered May 11 '18 at 19:46

4

In addition to the answer given by Mad Physicist, the following can also be applied.

theta = theta - (alpha/m) * sum( (X * theta - y).* X )';

answered May 11 '18 at 19:46

Rishu

95
1
5

Vectorization of a gradient descent code

2 Answers2

Linked