tf.losses.log_loss and tf.nn.softmax in Tensorflow and Pytorch

Question

I am trying to implement a network which has the following loss function definition in Pytorch

logits = F.log_softmax(layer_output)
loss = F.nll_loss(logits, labels)

This link https://discuss.pytorch.org/t/pytorch-equivalence-to-sparse-softmax-cross-entropy-with-logits-in-tensorflow/18727 mentions that log_softmax should be used instead of softmax as it is more stable before calculating nll loss

In tensorflow i have the following code

logits = tf.nn.log_softmax(layer_output)
loss = .tf.losses.log_loss(logits, labels)

This leads to a NAN value in loss from first iteration. If i use tf.nn.softmax I don't have a NAN value. But the link mentions log_loss is more stable. Is there any specific reason for this? I could get rid of the NANs using tf.clip_by_value but that leads to vanishing gradients.

You could use tf.nn.softmax_cross_entropy_with_logits_v2 This will take in raw logits, put it through softmax and then calculate the cross entropy loss. https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits_v2 — Burton2000, Dec 11 '18 at 17:19
[https://stackoverflow.com/questions/47245113/whats-the-difference-between-softmax-cross-entropy-with-logits-and-losses-log-l] . If i understand correctly, using log_loss should give better results as it calculates for negative examples as well. I have tried with tf.nn.softmax and then using these logits for log_loss which gives reasonable loss values. But as the original implementation has log_softmax , I am curious as to why that gives me NAN loss values — renderbender, Dec 12 '18 at 09:59

score 0 · Answer 1 · answered Jul 09 '19 at 00:12

This is wrong naming convention:

logits should go into the softmax or log_softmax (log of softmax).

These two lines are equivalent:

r = F.nll_loss(F.log_softmax(a, -1), p)
r = F.cross_entropy(a, p)

What you search is F.cross_entropy in PyTorch or

tf.nn.softmax_cross_entropy_with_logits in tf

tf.losses.log_loss and tf.nn.softmax in Tensorflow and Pytorch

1 Answers1