LSTM high loss, doesn't decreasing with each epoch

Question

I have a block of lines with some text, and for each line I want to predict the next word (word_0 -> word_1, than by word_0 and word_1 -> word_2 and so on for each line). Great tutorial and code here: Predict next word Source code

But in my way the loss doesn't decrease:

....
Epoch 42/50
2668/2668 [==============] - 1777s 666ms/step - loss: 4.6435 - acc: 0.1361
Epoch 43/50
2668/2668 [==============] - 1791s 671ms/step - loss: 4.6429 - acc: 0.1361
Epoch 44/50
2668/2668 [==============] - 1773s 665ms/step - loss: 4.6431 - acc: 0.1361
Epoch 45/50
2668/2668 [==============] - 1770s 664ms/step - loss: 4.6417 - acc: 0.1361
Epoch 46/50
2668/2668 [==============] - 1774s 665ms/step - loss: 4.6436 - acc: 0.1361
....

My LSTM NN setting:

nn_model = Sequential()
nn_model.add(Embedding(input_dim=vocab_size, output_dim=embedding_size, 
weights=[pretrained_weights]))
nn_model.add(LSTM(units=embedding_size, return_sequences=True))
nn_model.add(LSTM(units=embedding_size))
nn_model.add(Dense(units=vocab_size))
nn_model.add(Activation('softmax'))
nn_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', 
metrics=['accuracy'])

where:

pretrained_weights = model.wv.syn0 (model is Word2Vec model)
vocab_size, embedding_size = pretrained_weights.shape

I tried to change batch_size (128, 64, 20, 10); tried to add any LSTM layers, but all doesn't help me. What's wrong and how can I fix this problem?

have you tried decreasing the learning rate ? and what is the size of your LSTM layter ? have you experimented with that ? — Vaibhav gusain, Feb 18 '19 at 09:02
This oscillating loss values looks like an issue with a too jigh learning rate. I think in Keras you can use [optimizers](https://keras.io/optimizers/) that will take care of adapting the learning rate for you. — user2314737, Feb 18 '19 at 09:06
LSTM layer size = output_dim = 100 for my example. Changing optimizer doesn't help too — , Feb 18 '19 at 09:08
I've tried different optimizers, tried to lower learning rate (lr=0.0005), but I have results no more precisely than: loss: 3.1010 - acc: 0.2000 — , Feb 18 '19 at 10:05
If loss went down to ~3 with a lower learning rate, it looks like it's not an hyperparameter problem. I'd check the input data to see if it's ok (visualize it, anyway and all the ways you can) and ensure that the task is not too hard for the model. — David, Jul 26 '19 at 11:40

LSTM high loss, doesn't decreasing with each epoch

0 Answers0