0

I am trying to implement a network which has the following loss function definition in Pytorch

logits = F.log_softmax(layer_output)
loss = F.nll_loss(logits, labels)

This link https://discuss.pytorch.org/t/pytorch-equivalence-to-sparse-softmax-cross-entropy-with-logits-in-tensorflow/18727 mentions that log_softmax should be used instead of softmax as it is more stable before calculating nll loss

In tensorflow i have the following code

logits = tf.nn.log_softmax(layer_output)
loss = .tf.losses.log_loss(logits, labels)

This leads to a NAN value in loss from first iteration. If i use tf.nn.softmax I don't have a NAN value. But the link mentions log_loss is more stable. Is there any specific reason for this? I could get rid of the NANs using tf.clip_by_value but that leads to vanishing gradients.

  • You could use tf.nn.softmax_cross_entropy_with_logits_v2 This will take in raw logits, put it through softmax and then calculate the cross entropy loss. https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits_v2 – Burton2000 Dec 11 '18 at 17:19
  • [https://stackoverflow.com/questions/47245113/whats-the-difference-between-softmax-cross-entropy-with-logits-and-losses-log-l] . If i understand correctly, using log_loss should give better results as it calculates for negative examples as well. I have tried with tf.nn.softmax and then using these logits for log_loss which gives reasonable loss values. But as the original implementation has log_softmax , I am curious as to why that gives me NAN loss values – renderbender Dec 12 '18 at 09:59

1 Answers1

0

This is wrong naming convention:

logits should go into the softmax or log_softmax (log of softmax).

These two lines are equivalent:

r = F.nll_loss(F.log_softmax(a, -1), p)
r = F.cross_entropy(a, p)

What you search is F.cross_entropy in PyTorch or

tf.nn.softmax_cross_entropy_with_logits in tf

prosti
  • 42,291
  • 14
  • 186
  • 151