what's the difference between softmax_cross_entropy_with_logits and losses.log_loss?

Question

whats the primary difference between tf.nn.softmax_cross_entropy_with_logits and tf.losses.log_loss? both methods accept 1-hot labels and logits to calculate cross entropy loss for classification tasks.

asakryukin · Accepted Answer · 2017-11-12T16:22:57.373

Those methods are not so different in theory, however have number of differences in implementation:

1) tf.nn.softmax_cross_entropy_with_logitsis designed for single-class labels, while tf.losses.log_losscan be used for multi-class classification. tf.nn.softmax_cross_entropy_with_logits won't throw an error if you feed multi-class labels, however your gradients won't be calculated correctly and training most probably will fail.

From official documentation:

NOTE: While the classes are mutually exclusive, their probabilities need not be. All that is required is that each row of labels is a valid probability distribution. If they are not, the computation of the gradient will be incorrect.

2) tf.nn.softmax_cross_entropy_with_logits calculates (as it's seen from the name) soft-max function on top of your predictions first, while log_loss doesn't do this.

3) tf.losses.log_loss has a little wider functionality in a sense that you can weight each element of the loss function or you can specify epsilon, which is used in calculations, to avoid log(0) value.

4) Finally, tf.nn.softmax_cross_entropy_with_logits returns loss for every entry in the batch, while tf.losses.log_loss returns reduced (sum over all samples by default) value which can be directly used in optimizer.

UPD: Another difference is the way the calculate the loss, Logarithmic loss takes into account negative classes (those where you have 0s in the vector). Shortly, cross-enthropy loss forces network to produce maximum input for the correct class and does not care about negative classes. Logarithmic loss does both at the same time, it forces correct classes to have larger values and negative lesser. In mathematic expression it looks as following:

Cross-enthropy loss:

Logarithmic Loss:

Where i is the corresponding class.

So for example, if you have labels=[1,0] and predictions_with_softmax = [0.7,0.3], then:

1) Cross-Enthropy Loss: -(1 * log(0.7) + 0 * log(0.3)) = 0.3567

2) Logarithmic Loss: - (1*log(0.7) + (1-1) * log(1 - 0.7) +0*log(0.3) + (1-0) log (1- 0.3)) = - (log(0.7) + log (0.7)) = 0.7133

And then if you use default value for tf.losses.log_loss you then need to divide the log_loss output by the number of non-zero elements (here it's 2). So finally: tf.nn.log_loss = 0.7133 / 2 = 0.3566

In this case we got equal outputs, however it is not always the case

by your explanation, does that mean `tf.nn.softmax_cross_entropy_with_logits` is equivalent to `tf.reduce_sum(tf.losses.log_loss(tf.nn.softmax(logits)))`? — mynameisvinn, Nov 12 '17 at 14:29
@vin no, not really, however they are close to each other, log_loss penalises negative samples as well. So if you have label_vector = [1,0,0] softmax_cross_entropy_with_logits will only calculate loss for the first class and ignore others, while log loss will calculate negative loss as well. In other words during the softmax_cross_entropy_with_logits will tend to make true class have maximum value, while log_loss will tend to maximize true classes and minimize negative ones as well. This is not a big difference in terms of training in the end. Let me update this in my answer if you need. — asakryukin, Nov 12 '17 at 16:00
How did you come up with the CE loss formula what are you summing over? — Xyand, Feb 19 '18 at 20:26
@Xyand this is the formula for a single instance, so we are summing over predicted classes. For a batch then you will have an array of those losses for each sample, which you will reduce later (mean reduce probably). Is that what you're asking? — asakryukin, Feb 21 '18 at 03:50
Thanks. I think you got the UPD part wrong (see first answer + comments: https://stats.stackexchange.com/questions/166958/multinomial-logistic-loss-vs-cross-entropy-vs-square-error). Theoretically log loss and CE are the same, the two formulas you wrote are the same but the notations are different (not same yi). That's why I asked if your answer is based on actual implementation or theoretic point of view. — Xyand, Feb 21 '18 at 08:40

score 0 · Answer 2 · answered Nov 12 '17 at 08:01

0

There are basically two differences between,

1) Labels used in tf.nn.softmax_cross_entropy_with_logits are the one hot version of labels used in tf.losses.log_loss.

2) tf.nn.softmax_cross_entropy_with_logits calcultes the softmax of logits internally before the calculation of the cross-entrophy.

Notice that tf.losses.log_loss also accepts one-hot encoded labels. However, tf.nn.softmax_cross_entropy_with_logits only accepts the labels with one-hot encoding.

Hope this helps.

answered Nov 12 '17 at 08:01

Nipun Wijerathne

1,839
11
13

by your explanation, does that mean `tf.nn.softmax_cross_entropy_with_logits` is equivalent to `tf.losses.log_loss(tf.nn.softmax(logits))`? – mynameisvinn Nov 12 '17 at 14:26

what's the difference between softmax_cross_entropy_with_logits and losses.log_loss?

2 Answers2

Linked