NaN with softmax cross entropy in simple model with dummy inputs

Question

I was simplifying my model in order to see where the NaN error occurs and narrowed it down to my loss function:

import tensorflow as tf
from tensorflow.python import debug as tf_debug

def train_input_fn():
  pass


def model_fn(features, labels, mode, params):

  classes = 225
  enc = tf.ones((1,20,1024), dtype=tf.float16)
  labels = tf.ones((1,20), dtype=tf.int32)

  logits = tf.layers.dense(enc, classes)
  loss = tf.reduce_sum(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels)) / 20
  train_op = tf.train.AdamOptimizer(learning_rate=0.00001, beta1=0.9, beta2=0.999).minimize(loss)

  return tf.estimator.EstimatorSpec(mode, loss=loss, train_op=train_op)


if __name__ == '__main__':

  model_directory = path/to/logdir
  hooks = [tf_debug.LocalCLIDebugHook(ui_type="readline")]

  classifier = tf.estimator.Estimator(
      model_fn=model_fn,
      model_dir=model_directory,
      params={},
  )

  classifier.train(input_fn=lambda: train_input_fn(), hooks = hooks)

After the third or fourth 'run' with the tensorflow debugger on a fresh model directory I get 'NaN loss during training.'. I already tried to set the learning rate very low, but nothing changed. I'm using tensorflow-gpu 1.8.

score 1 · Accepted Answer · answered Jun 22 '18 at 09:09

1

I've tried your given code. I was getting NaN right from the first step.

And I've checked the official documentation.

logits: Unscaled log probabilities of shape [d_0, d_1, ..., d_{r-1}, num_classes] and dtype float32 or float64.

Changed enc = tf.ones((1,20,1024), dtype=tf.float16) to enc = tf.ones((1,20,1024), dtype=tf.float32) and it worked!

answered Jun 22 '18 at 09:09

End-2-End

921
8
16

1

While the documentation stated float32 or 64 it works fine for me with float16. My issue seems to be with the Adam optimizer as stated in the other answer. Thank you never the less for your insightful answer! – user2368505 Jun 23 '18 at 23:35
Float16 would work, but with internal changes to defaults, like Epsilon you've mentioned. Although, unless you're on strict constraints with memory, I'd recommend using float32 or float64, because there might be other ops that would natively expect the input to be in that format. – End-2-End Jun 24 '18 at 07:58

score 0 · Answer 2 · answered Jun 22 '18 at 13:00

0

Using tf.float16 for Adam optimization variables makes it necessary to use higher epsilon values for numerical stability. When I add epsilon=1e-04 (standard is 1e-08) to Adam optimizer, it works for me.

answered Jun 22 '18 at 13:00

user2368505

416
3
16

NaN with softmax cross entropy in simple model with dummy inputs

2 Answers2