2

I have a model with N inputs and 6 outputs after each epoch.

My output looks like, [x y z xx yy zz] and I want to minimize the MSE of each term. However, I've noticed that when I use MSE as a loss function, it is just taking the mean of the sum of the squares of the entire set.

lennon310
  • 12,503
  • 11
  • 43
  • 61
  • So your output is a concatenation of several outputs? Why not leave the concatenation out of it and use a loss for each output? Your model can have several outputs. – nemo Mar 22 '17 at 22:11
  • @DwightTemple Have you been following the discussion on this question? If I have answered your question, please mark it as accepted. If not, let me know if there has been a misinterpretation. – Autonomous Mar 24 '17 at 16:31
  • That's actually what I started doing. I put my model in a loop and then fit one variable at a time. I'm a newbie with keras, and I'm unsure how to implement multiple outputs. – Dwight Temple Mar 28 '17 at 14:37

2 Answers2

1

You have to create a tensor equal to MSE and minimize that.

mse = tf.reduce_mean(tf.square(outputs))
train_step = tf.train.*Optimizer(...).minimize(mse)
for _ in range(iterations):
  sess.run(train_step ... )
Simba
  • 1,641
  • 1
  • 11
  • 16
  • OK for tensorflow but this is posted in the [keras] tag so your answer seems off topic. – nemo Mar 22 '17 at 22:12
1

I think they both mean the same thing. Let us denote your predictions for i^th sample by [x_i, y_i, z_i, xx_i, yy_i, zz_i]. The true values are denoted by [t_x_i, t_y_i, t_z_i, t_xx_i, t_yy_i, t_zz_i]

Over a batch of N samples, you want to minimize:

L = \sum_i=1^N ((x_i-t_x_i)^2)/N + ... + \sum_i=1^N ((zz_i-t_zz_i)^2)/N 

The MSE loss will minimize the following:

L = (1/N) * \sum_i=1^N ((1/6) * [(x_i - t_x_i)^2 + ... + (zz_i-t_zz_i)^2])

You can see that both finally minimize the same quantity.

I think this will stand true in case your six outputs are independent variables, which I think they are, since you model them as six distinct outputs with six ground truth labels.

Autonomous
  • 8,935
  • 1
  • 38
  • 77
  • 1
    You conveniently ignore the vector dimension which is the problem here. It is actually `mean(mean((pred-true)**2,axis=-1), axis=0)`. – nemo Mar 23 '17 at 00:46
  • @nemo I did not ignore the dimension. His output is a 6-D vector, from `x` to `zz`. I have all six dimensions in my answer (now a little edited). In any case, let us consider your expression. Minimizing `mean(mean((pred-true)**2,axis=-1), axis=0)` means the optimizer will try to minimize each term in `mean((pred-true)**2,axis=-1)` and in turn each term in `(pred-true)**2`, given that each term are independent variables (which I assume they are unless explicitly stated). See [this](http://math.stackexchange.com/q/1635105/64139). – Autonomous Mar 23 '17 at 01:40
  • 1
    There is no question that the optimizer will optimize these regardless of the summation. My problem is more with the dimension the mean is taken over. Split outputs would lead to individual normalization `(1/(N*N_x) f(x) + 1/(N*N_y) + f(y) + ...)` while combined this would be reduced to `(1/(N*(N_x+N_y+...)) ...` which is not equivalent. – nemo Mar 23 '17 at 07:34
  • @nemo Okay. Now I think I understand where we differ. You think that each of those six outputs are multidimensional. That is why you have `N_x, N_y, ..., N_zz`. However, I do not think OP's setting is similar. He has six outputs, so six neurons in the final FC layer. I assume that each of those six outputs is a scalar. So in my case, `N_x, ..., N_zz` are all equal to one. I will probably write this down in a neat fashion once I get some time. – Autonomous Mar 23 '17 at 17:52
  • 2
    Yep. Otherwise the question wouldn't have made sense to mean because, as you described, if these are single neurons then there's no problem. Only if the output is a concatenation of multiple layers there is a problem. Nice discussion anyway. We'll see what OP really wants. – nemo Mar 23 '17 at 22:02