My first time using Tensorflow on the MNIST dataset, I had a really simple bug where I forgot to take mean of my error values before passing it to the optimizer.
In other words, instead of
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y, labels=y_))
I accidentally used
loss = tf.nn.softmax_cross_entropy_with_logits(logits=y, labels=y_)
Not taking the mean or sum of the error values threw no errors when training the network, however. This led me thinking: Is there actually a case when someone would need to pass in multiple loss values into an optimizer? What was happening when I passed in a Tensor not of size [1] into minimize()?