Coursera Machine Learning: Gradient Descent vectorization

Question

I am having issues understanding how to vectorize functions on the Machine Learning course available on Coursera.

In the course, Andrew Ng explains that the hypothesis can be vectorized to the transpose of theta multiplied by x:

H(x) = theta' * X

My first problem is when I implement this on exercises. Why does the vectorization on paper is the transpose of theta multiplied by x while on Octave it is X times theta?

theta'*X % leads to errors while multiplying

My second problem follow the first one.

When I want to vectorize this sum of the gradient descent function:

sum((h(x)-y)*x))

I don't really understand how you get to this once vectorized:

X'*(h(x)-y)

Anyone could explain this?

To answer this it's important to know the size of the matrices (X, theta, x, y...). Btw, you shouldn't use `'` as transpose, use `.'` — Andy, Oct 14 '17 at 10:42
Thank you for your answer. Currently my value of X is a 47x3 matrix filled with ones and values available here : https://github.com/yhyap/machine-learning-coursera/blob/master/mlclass-ex1/ex1data2.txt. The third column is y (47x1 matrix). Theta is [0 ; 0 ; 0]. In this specific case, do you know why we can vecotrize it this way? — etiennefr, Oct 14 '17 at 21:27
Possible duplicate of [Machine learning - Linear regression using batch gradient descent](https://stackoverflow.com/questions/32274474/machine-learning-linear-regression-using-batch-gradient-descent) — rayryeng, Oct 30 '17 at 01:59

score 1 · Answer 1 · answered Oct 31 '17 at 07:51

It is a question of taste. The usual convention is to have matrix-vector multiplications, i.e., what you prefer. You can switch from one mode to the other by transposing everything. That is, if your multiplication X*theta works, then the transposed formula is theta.' * X.'

In X*theta each of the rows of X contains the data (kernel function values) of one sample point.

In the theta.'*X convention, it is the columns of X that contain the sample point data.

So it always depends on context, what is defined as row and what as column vector and how they are put together in larger objects or operations.

score 0 · Answer 2 · answered Aug 09 '18 at 15:01

0

While explaining theory in Coursera ML MOOC by Andrew Ng,Stanford that you might also been refering.He used thetha as vector of n X 1 but in the course work we had a 1 X n vector. So the theta in practical was a transpose of theta(theta')

answered Aug 09 '18 at 15:01

Kokul Jose

1,384
2
14
26

Coursera Machine Learning: Gradient Descent vectorization

2 Answers2