Input contains NaN while using LSTM

Question

I am trying to build a univariate encoder-decoder LSTM model. I got this error again and again:

ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

I have already searched and read the other posts who asked about the same error, however, I am sure the data hasn't any nan values.

The nan value resulted because of the LSTM hidden computations What's make me sure about this is that:

I did a loop for each epoch to call model.fit and print the history.

for j in range(numEpoch):
  history = model.fit(trX, targetTRAIN, epochs=1, batch_size=batchSize, verbose=0, shuffle=False, validation_split=valdSplit)
  print(history.history)

It works well till about numEpoch=610 (sorry I forgot the exact number) then it started showing nan as validation loss.

Here is my model definition:

numEpoch = 2000
batchSize = 1
actFunc = 'relu'
valdSplit=0.1
dropOutRate=0.2
optimizer = SGD(lr=0.01, momentum=0.9)


model = Sequential()
randSeed = randSeed + 1
kernelInitializer = RandomNormal(seed=randSeed)
model.add(LSTM(30, batch_input_shape=(batchSize, timeStep, numFeat), activation=actFunc, kernel_initializer=kernelInitializer, dropout=dropOutRate,stateful=True , return_sequences=True))
model.add(Dropout(dropOutRate))
model.add(LSTM(20,  kernel_initializer=kernelInitializer, stateful=False ,activation=actFunc, return_sequences=False))
model.add(Dropout(dropOutRate))
randSeed = randSeed + 1
kernelInitializer = RandomNormal(seed=randSeed)
model.add(Dense(numFeat, kernel_initializer=kernelInitializer, activation='linear'))
model.compile(loss='mean_squared_error', optimizer=optimizer, metrics=['accuracy'])

Train_X shape is (362, 3, 27)

I am happy to give more details if needed.

How high is the dropout rate? And what is actFunc? And is there a reason, why your first lstm layer has stateful set to true and the second has not? — MichaelJanz, Aug 13 '20 at 06:48
@MichaelJanz I added the dropout rate to the code, thanks for asking. For the stateful, no there is no reason, I am just a beginner, I am playing with the configuration and want to see how can I improve it. Do you think it doesn't make sense if one of the layers is stateless while the other stateful? — Noori, Aug 13 '20 at 08:30
Statefullness means, that the LSTM returns its current state for the next layer (as far as my understanding is). Normally you use it in the case that you have very long sequences. Because LSTMs are bad with long sequences (lets say about 1000), you can split those sequences, but tell the lstm, that the next sequence is a successor of the previous one. You do that with stateful=True. So you are basically saying now in the first lstm: The sequences belong together while you say in the second lstm: they dont — MichaelJanz, Aug 13 '20 at 08:49
Oh, I see, you make it very clear. Thank you so much @MichaelJanz Do you have any answer for the nan case, do you think the state of the layer is the reason? — Noori, Aug 13 '20 at 09:03
It might be the reason, I recommend you to try it out. If you think that my comment helped you, feel free to upvote it — MichaelJanz, Aug 13 '20 at 09:07
unfortunately, the problem is still there even I changed the second layer to be stateful. When it's fitting the model after some epochs it starts to show nan as loss, any help? — Noori, Aug 14 '20 at 12:55

Input contains NaN while using LSTM

0 Answers0