There's nothing wrong with that answer... at least not initially. However, you have hardcoded the ones
vector to have 97 elements. What you need to do is ensure that the ones
vector is as long as there are training examples. 97 elements would not work for your dataset X
if it didn't have 97 elements and so if you tried to run this on a different shaped X
, you will get an incompatible dimensions error.
Therefore, use the total number of elements m
, then use that to replace the number 97:
J= (1/(2*m)) * (ones(1, m) * (((X*theta)-y).^2 ));
Just to be sure that you're getting the right answer, let's create a random X
, y
and theta
with 100 training examples and a two parameter vector. We'll use both expressions for the cost and show that they produce the same cost:
>> rng(123);
>> X = rand(100, 2);
>> y = rand(100, 1);
>> theta = rand(2, 1);
>> m = size(X, 1);
>> J = 1 / (2 * m) * sum(((X * theta) - y).^2);
>> J2 = (1/(2*m)) * (ones(1, m) * (((X*theta)-y).^2 ));
>> format long g;
>> J
J =
0.0702559647930643
>> J2
J2 =
0.0702559647930643
A word of advice
You have determined that finding the sum of a vector can be done by multiplying the vector with an appropriately filled vector full of ones
. I would argue that this is less efficient and you can use the fact that in this particular cost function, you can find the dot product between the vector produced by X*theta - y
with itself. The dot product can conveniently be computed by v.' * v
where v
is a column vector. This is simply a matrix multiplication where the left side of the multiplication operator is a row vector and the right side of the multiplication operator is a column vector. I'll let you verify this yourself but if you work out what the equivalent operation is doing, this would be the dot product.
By virtue of the above formulation, taking the dot product of a vector with itself would be summing the squared values of every element in X*theta - y
together. Therefore, do this instead:
d = X*theta - y;
J = (1 / (2*m)) * (d.' * d);
You will also see that you get the same results:
>> d = X*theta - y;
>> J = (1 / (2*m)) * (d.' * d)
J =
0.0702559647930643