1

Since the source code of tf.nn.softmax_cross_entropy_with_logits in gen_nn_ops is hidden, could anyone perhaps explain me how tensorflow compute the cross entropy after Softmax. I mean, after softmax it might output 0 because of precision which will give rise to a NaN problem with cross entropy. Did tensorflow use clip method when softmax to bound the output of it?

Nan
  • 299
  • 2
  • 3
  • 9

1 Answers1

0

The implementation of tf.nn.softmax_cross_entropy_with_logits further goes to native C++ code, here is XLA implementation. Logits are not bound and 0 is possible when one of the logits is much bigger than others. Example:

>>> session.run(tf.nn.softmax([10.0, 50.0, 100.0, 200.0]))
array([ 0.,  0.,  0.,  1.], dtype=float32)

If you wish, you can clip the logits just before the softmax, but it's not recommended, because it kills the gradient when the output is large. A better option is to use batch normalization to make the activations more like normally distributed.

Maxim
  • 52,561
  • 27
  • 155
  • 209