Keras - Nan in summary histogram LSTM

Question

I've written an LSTM model using Keras, and using LeakyReLU advance activation:

    # ADAM Optimizer with learning rate decay
    opt = optimizers.Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0001)

    # build the model
    model = Sequential()

    num_features = data.shape[2]
    num_samples = data.shape[1]

    model.add(
        LSTM(16, batch_input_shape=(None, num_samples, num_features), return_sequences=True, activation='linear'))
    model.add(LeakyReLU(alpha=.001))
    model.add(Dropout(0.1))
    model.add(LSTM(8, return_sequences=True, activation='linear'))
    model.add(Dropout(0.1))
    model.add(LeakyReLU(alpha=.001))
    model.add(Flatten())
    model.add(Dense(1, activation='sigmoid'))

    model.compile(loss='binary_crossentropy', optimizer=opt,
                  metrics=['accuracy', keras_metrics.precision(), keras_metrics.recall(), f1])

My data is a balanced binary labeled set. i.e: 50% labeled 1 50% labeled 0. I've used activation='linear' for the LSTM layers preceding the LeakyReLU activation, similar to this example I found on GitHub.

The model throws Nan in summary histogram error in that configuration. Changing the LSTM activations to activation='sigmoid' works well, but seems like the wrong thing to do.

Reading this StackOverflow question suggested "introducing a small value when computing the loss", I'm just not sure how to do it on a built-in loss function.

Any help/explanation would be appreciated.

Update: I can see that the loss is nan on the first epoch

260/260 [==============================] - 6s 23ms/step - 
loss: nan - acc: 0.5000 - precision: 0.5217 - recall: 0.6512 - f1: nan - val_loss: nan - val_acc: 0.0000e+00 - val_precision: -2147483648.0000 - val_recall: -49941480.1860 - val_f1: nan

Update 2 I've upgraded both TensorFlow & Keras to versions 1.12.0 & 2.2.4 . There was no effect.

I also tried adding a loss to the first LSTM layer as suggested by @Oluwafemi Sule, it looks like a step in the right direction, now the loss is not nan on the first epoch, however, I still get the same error ... probably because of other nan values, like the val_loss / val_f1.

[==============================] - 7s 26ms/step - 
loss: 1.9099 - acc: 0.5077 - precision: 0.5235 - recall: 0.6544 - f1: 0.5817 - val_loss: nan - val_acc: 0.5172 - val_precision: 35.0000 - val_recall: 0.9722 - val_f1: nan

Update 3 I tried to compile the network with just the accuracy metric, with no success:

Epoch 1/300
260/260 [==============================] - 8s 29ms/step - loss: nan - acc: 0.5538 - val_loss: nan - val_acc: 0.0000e+00

I had a similar issue once but mine was due to Nan values in the data-set — kerastf, Oct 31 '18 at 19:47
I'm not really sure if your gradients are exploding because LeakyRelu on its own is not enough to make it converge. But there is generally an option called 'clipnorm' or 'clipvalue' that you can pass with all the optimizers. This helps you clip gradients and is generally used to find ways out of local minimas. You could try that over here and see if it makes any difference? [Source](https://keras.io/optimizers/) — kvish, Nov 01 '18 at 17:37
@ShlomiSchwartz Have you tried upgrading the TensorFlow and Keras and see if the issue is still there? If it is, then try using the Adam optimizer with default parameter and just modify the learning rate. Try `1e-3`, `1e-4` or `1e-5` as the learning rate. Further, did you try the clipnorm for clipping the gradients. Additionally, please use @user_name at the beginning of your comment when you are replying to a specific user, otherwise that user won't be notified of your comment (I was not notified of your previous comment, I just checked this question by chance and saw that you have answered). — today, Nov 05 '18 at 12:18
@today thanks I'll give it a go, I haven't tried clipnorm, because I'm not sure how exactly, can you please add an answer with a code example? — Shlomi Schwartz, Nov 05 '18 at 12:26
@ShlomiSchwartz Just pass `clipnorm=1.0` argument to the optimizer, e.g. `Adam(..., clipnorm=1.0)`. — today, Nov 05 '18 at 12:58
@today `clipnorm=1.0` did not solve my issue when using `activation='linear'` I still get the `Nan in summary histogram` error (still same TF & Keras versions) — Shlomi Schwartz, Nov 05 '18 at 14:01
What happens when you increase the argument alpha (say to 0.3) from the LeakyReLUs? — rvinas, Nov 05 '18 at 14:34
If the problem is caused by a -Inf from LSTM layers' outputs, changing LeakyReLU to regular ReLU layers might fix it. I would also check the training set for Nan values. — Mete Han Kahraman, Nov 06 '18 at 13:29
@ShlomiSchwartz Could you try compiling and training the network without those additional metrics? Only use `accuracy` and see if you still get this error. — today, Nov 07 '18 at 12:16
Hi @ShlomiSchwartz, can you check out how the weights are initialized and rather try printing them as the loss is being computed? Theoretically, LSTMs or any kind of recursive network is prone to NaNs due to a large number of recursive multiplications and bad initialization of weights. So the fact that a skewed dataset might be a reason for NaNs is not probable enough when compared with the recursive nature of LSTMs — najeeb khan, Sep 19 '19 at 13:05

score 3 · Answer 1 · answered Nov 06 '18 at 23:31

3

This answers starts from the suggestion to introduce a small value when computing the loss.

keras.layers.LSTM as with all layers that are direct or indirect subclasses of keras.engine.base_layer.Layer has a add_loss method that can be used to set a starting value for the loss.

I suggest to do this for the LSTM layer and see if it makes any difference for your results.

lstm_layer = LSTM(8, return_sequences=True, activation='linear')
lstm_layer.add_loss(1.0)

model.add(lstm_layer)

answered Nov 06 '18 at 23:31

Oluwafemi Sule

36,144
1
56
81

thanks for your answer. It looks like a step in the right direction, now on the first epoch I can see `260/260 [==============================] - 7s 26ms/step - loss: 1.9099 - acc: 0.5077 - precision: 0.5235 - recall: 0.6544 - f1: 0.5817 - val_loss: nan - val_acc: 0.5172 - val_precision: 35.0000 - val_recall: 0.9722 - val_f1: nan` So loss is no longer nan, however I still get the same error ... probably because of other nan values, like the val_loss / val_f1 ? – Shlomi Schwartz Nov 07 '18 at 08:19

Keras - Nan in summary histogram LSTM

1 Answers1

Linked