Cost Function, what's the difference between sum(x) and ones(1,length(x)) *x?

Question

I'm doing Professor Andrew Ng's Machine Learning course on Coursera. I'm trying to code the cost function.

This was my first solution:

J= (1/(2*m))* (ones(1,97) * (((X*theta)-y).^2 ));

But it wasn't accepted, so I tried it with sum:

J = 1 / (2 * m) * sum(((X * theta) - y).^2);

and was accepted. Can you say me why? The only thing I changed was ones and sum but the result still the same.

rayryeng · Accepted Answer · 2017-01-10T22:44:43.993

There's nothing wrong with that answer... at least not initially. However, you have hardcoded the ones vector to have 97 elements. What you need to do is ensure that the ones vector is as long as there are training examples. 97 elements would not work for your dataset X if it didn't have 97 elements and so if you tried to run this on a different shaped X, you will get an incompatible dimensions error.

Therefore, use the total number of elements m, then use that to replace the number 97:

J= (1/(2*m)) * (ones(1, m) * (((X*theta)-y).^2 ));

Just to be sure that you're getting the right answer, let's create a random X, y and theta with 100 training examples and a two parameter vector. We'll use both expressions for the cost and show that they produce the same cost:

>> rng(123);
>> X = rand(100, 2);
>> y = rand(100, 1);
>> theta = rand(2, 1);
>> m = size(X, 1);
>> J = 1 / (2 * m) * sum(((X * theta) - y).^2);
>> J2 = (1/(2*m)) * (ones(1, m) * (((X*theta)-y).^2 ));
>> format long g;
>> J

J =

        0.0702559647930643

>> J2

J2 =

        0.0702559647930643

A word of advice

You have determined that finding the sum of a vector can be done by multiplying the vector with an appropriately filled vector full of ones. I would argue that this is less efficient and you can use the fact that in this particular cost function, you can find the dot product between the vector produced by X*theta - y with itself. The dot product can conveniently be computed by v.' * v where v is a column vector. This is simply a matrix multiplication where the left side of the multiplication operator is a row vector and the right side of the multiplication operator is a column vector. I'll let you verify this yourself but if you work out what the equivalent operation is doing, this would be the dot product.

By virtue of the above formulation, taking the dot product of a vector with itself would be summing the squared values of every element in X*theta - y together. Therefore, do this instead:

d = X*theta - y;
J = (1 / (2*m)) * (d.' * d);

You will also see that you get the same results:

>> d = X*theta - y;
>> J = (1 / (2*m)) * (d.' * d)

J =

        0.0702559647930643

Cost Function, what's the difference between sum(x) and ones(1,length(x)) *x?

1 Answers1

A word of advice