4

My first time using Tensorflow on the MNIST dataset, I had a really simple bug where I forgot to take mean of my error values before passing it to the optimizer.

In other words, instead of

loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y, labels=y_))

I accidentally used

loss = tf.nn.softmax_cross_entropy_with_logits(logits=y, labels=y_)

Not taking the mean or sum of the error values threw no errors when training the network, however. This led me thinking: Is there actually a case when someone would need to pass in multiple loss values into an optimizer? What was happening when I passed in a Tensor not of size [1] into minimize()?

ejlu
  • 41
  • 4

1 Answers1

4

They are being added up. This is side-product of TensorFlow using Reverse Mode AD to differentiate, which requires loss to be a scalar

Yaroslav Bulatov
  • 57,332
  • 22
  • 139
  • 197