I am having issues understanding how to vectorize functions on the Machine Learning course available on Coursera.
In the course, Andrew Ng explains that the hypothesis can be vectorized to the transpose of theta multiplied by x:
H(x) = theta' * X
My first problem is when I implement this on exercises. Why does the vectorization on paper is the transpose of theta multiplied by x while on Octave it is X times theta?
theta'*X % leads to errors while multiplying
My second problem follow the first one.
When I want to vectorize this sum of the gradient descent function:
sum((h(x)-y)*x))
I don't really understand how you get to this once vectorized:
X'*(h(x)-y)
Anyone could explain this?