-1

Mean squared error is a popular cost function used in machine learning:

(1/n) * sum(y - pred)**2

Basically the order of subtraction terms doesn't matter as the whole expression is squared.

But if we differentiate this function, it will no longer be squared:

2 * (y - pred)

Would the order make a difference for a neural network?

In most cases reversing the order of the terms y and pred would change the sign of the result. As we use the result to compute the slope of the weight - would it influence the way the neural network converges?

1 Answers1

2

Well, actually

and

so they're the same.

(I took the derivative w.r.t. y_i assuming those are the network outputs but of course the same holds if you derive by \hat{y}_i.)

cheersmate
  • 2,385
  • 4
  • 19
  • 32