The output of softmax makes the binary cross entropy's output NAN, what should I do?

Question

I have implemented a neural network in Tensorflow where the last layer is a convolution layer, I feed the output of this convolution layer into a softmax activation function then I feed it to a cross-entropy loss function which is defined as follows along with the labels but the problem is I got NAN as the output of my loss function and I figured out it is because I have 1 in the output of softmax. So, my question is what should I do in this case? My input is a 16 by 16 image where I have 0 and 1 as the values of each pixel (binary classification)

My loss function:

#Loss function
def loss(prediction, label):
    #with tf.variable_scope("Loss") as Loss_scope:
    log_pred = tf.log(prediction, name='Prediction_Log')
    log_pred_2 = tf.log(1-prediction, name='1-Prediction_Log')
    cross_entropy = -tf.multiply(label, log_pred) - tf.multiply((1-label), log_pred_2) 

    return cross_entropy

You should use [`tf.nn.softmax_cross_entropy_with_logits_v2`](https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits_v2) or [`tf.losses.softmax_cross_entropy`](https://www.tensorflow.org/api_docs/python/tf/losses/softmax_cross_entropy) for that, using the outputs of the last layer _before_ the softmax activation (the "logits"). Those functions are designed to handle extreme cases correctly. — jdehesa, Jun 21 '19 at 09:56
@jdehesa Good point! :-) I should really have included a pointer to the out of the box functions in my answer. I assumed the OPs question was about implementing her own loss fn — Stewart_R, Jun 21 '19 at 12:05
updated answer now with a note about the out of the box functions handling this nicely — Stewart_R, Jun 21 '19 at 12:05
@ jdehesa , I have already tried those (without softmax as the documentation says) but the problem is my loss is zero and so my model does not learn. — MRM, Jun 22 '19 at 01:15

Stewart_R · Accepted Answer · 2019-06-21T12:04:13.443

Note that log(0) is undefined so if ever prediction==0 or prediction==1 you will have a NaN.

In order to get around this it is commonplace to add a very small value epsilon to the value passed to tf.log in any loss function (we also do a similar thing when dividing to avoid dividing by zero). This makes our loss function numerically stable and the epsilon value is small enough to be negligible in terms of any inaccuracy it introduces to our loss.

Perhaps try something like:

#Loss function
def loss(prediction, label):
    #with tf.variable_scope("Loss") as Loss_scope:

    epsilon = tf.constant(0.000001)
    log_pred = tf.log(prediction + epsilon, name='Prediction_Log')
    log_pred_2 = tf.log(1-prediction + epsilon, name='1-Prediction_Log')

    cross_entropy = -tf.multiply(label, log_pred) - tf.multiply((1-label), log_pred_2) 
    return cross_entropy

UPDATE:

As jdehesa points out in his comments though - the 'out of the box' loss functions handle the numerical stability issue nicely already

The output of softmax makes the binary cross entropy's output NAN, what should I do?

1 Answers1