whats the primary difference between tf.nn.softmax_cross_entropy_with_logits
and tf.losses.log_loss
? both methods accept 1-hot labels and logits to calculate cross entropy loss for classification tasks.

- 341
- 4
- 10
2 Answers
Those methods are not so different in theory, however have number of differences in implementation:
1) tf.nn.softmax_cross_entropy_with_logits
is designed for single-class labels, while tf.losses.log_loss
can be used for multi-class classification. tf.nn.softmax_cross_entropy_with_logits
won't throw an error if you feed multi-class labels, however your gradients won't be calculated correctly and training most probably will fail.
From official documentation:
NOTE: While the classes are mutually exclusive, their probabilities need not be. All that is required is that each row of labels is a valid probability distribution. If they are not, the computation of the gradient will be incorrect.
2) tf.nn.softmax_cross_entropy_with_logits
calculates (as it's seen from the name) soft-max function on top of your predictions first, while log_loss doesn't do this.
3) tf.losses.log_loss
has a little wider functionality in a sense that you can weight each element of the loss function or you can specify epsilon
, which is used in calculations, to avoid log(0) value.
4) Finally, tf.nn.softmax_cross_entropy_with_logits
returns loss for every entry in the batch, while tf.losses.log_loss
returns reduced (sum over all samples by default) value which can be directly used in optimizer.
UPD: Another difference is the way the calculate the loss, Logarithmic loss takes into account negative classes (those where you have 0s in the vector). Shortly, cross-enthropy loss forces network to produce maximum input for the correct class and does not care about negative classes. Logarithmic loss does both at the same time, it forces correct classes to have larger values and negative lesser. In mathematic expression it looks as following:
Cross-enthropy loss:
Logarithmic Loss:
Where i is the corresponding class.
So for example, if you have labels=[1,0] and predictions_with_softmax = [0.7,0.3], then:
1) Cross-Enthropy Loss: -(1 * log(0.7) + 0 * log(0.3)) = 0.3567
2) Logarithmic Loss: - (1*log(0.7) + (1-1) * log(1 - 0.7) +0*log(0.3) + (1-0) log (1- 0.3)) = - (log(0.7) + log (0.7)) = 0.7133
And then if you use default value for tf.losses.log_loss
you then need to divide the log_loss
output by the number of non-zero elements (here it's 2). So finally: tf.nn.log_loss = 0.7133 / 2 = 0.3566
In this case we got equal outputs, however it is not always the case

- 2,524
- 1
- 13
- 14
-
by your explanation, does that mean `tf.nn.softmax_cross_entropy_with_logits` is equivalent to `tf.reduce_sum(tf.losses.log_loss(tf.nn.softmax(logits)))`? – mynameisvinn Nov 12 '17 at 14:29
-
@vin no, not really, however they are close to each other, log_loss penalises negative samples as well. So if you have label_vector = [1,0,0] softmax_cross_entropy_with_logits will only calculate loss for the first class and ignore others, while log loss will calculate negative loss as well. In other words during the softmax_cross_entropy_with_logits will tend to make true class have maximum value, while log_loss will tend to maximize true classes and minimize negative ones as well. This is not a big difference in terms of training in the end. Let me update this in my answer if you need. – asakryukin Nov 12 '17 at 16:00
-
@vin please check the update, I hope it's more clear now – asakryukin Nov 12 '17 at 16:23
-
How did you come up with the CE loss formula what are you summing over? – Xyand Feb 19 '18 at 20:26
-
@Xyand this is the formula for a single instance, so we are summing over predicted classes. For a batch then you will have an array of those losses for each sample, which you will reduce later (mean reduce probably). Is that what you're asking? – asakryukin Feb 21 '18 at 03:50
-
Thanks. I think you got the UPD part wrong (see first answer + comments: https://stats.stackexchange.com/questions/166958/multinomial-logistic-loss-vs-cross-entropy-vs-square-error). Theoretically log loss and CE are the same, the two formulas you wrote are the same but the notations are different (not same yi). That's why I asked if your answer is based on actual implementation or theoretic point of view. – Xyand Feb 21 '18 at 08:40
There are basically two differences between,
1) Labels used in tf.nn.softmax_cross_entropy_with_logits
are the one hot version of labels used in tf.losses.log_loss
.
2) tf.nn.softmax_cross_entropy_with_logits
calcultes the softmax of logits internally before the calculation of the cross-entrophy.
Notice that tf.losses.log_loss
also accepts one-hot encoded labels. However, tf.nn.softmax_cross_entropy_with_logits
only accepts the labels with one-hot encoding.
Hope this helps.

- 1,839
- 11
- 13
-
by your explanation, does that mean `tf.nn.softmax_cross_entropy_with_logits` is equivalent to `tf.losses.log_loss(tf.nn.softmax(logits))`? – mynameisvinn Nov 12 '17 at 14:26