Keras-- low accuracy with LSTM layer but the accuracy is good without LSTM

Question

I am training a model in Keras with IMDB dataset. For this model with LSTM layer, the accuracy is about 50%:

 model = Sequential()
 model.add(Embedding(max_features, 32))
 model.add(LSTM(32, return_sequences=True))
 model.add(LSTM(32, return_sequences=True))
 model.add(LSTM(32))
 model.add(Dense(1, activation='sigmoid'))

Accuracy:

loss: 0.6933 - acc: 0.5007 - val_loss: 0.6932 - val_acc: 0.4947

I have also tried with a single LSTM layer but it also gives similar accuracy.

However, if I don't use LSTM layer the accuracy reaches to around 82%

model = models.Sequential()
model.add(layers.Dense(16, kernel_regularizer=regularizers.l1(0.001), activation='relu', input_shape=(10000,)))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(16, kernel_regularizer=regularizers.l1(0.001), activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(1, activation='sigmoid'))

Accuracy:

 loss: 0.6738 - acc: 0.8214 - val_loss: 0.6250 - val_acc: 0.8320

This is how I compile and fit the model:

model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
model.fit(partial_x_train, partial_y_train, epochs=Numepochs, batch_size=Batchsize, validation_data=(x_val, y_val))

How can this be explained? I thought LSTM works great for sequential text data?

Could you show the data loading and preprocessing stage for the LSTM model? — today, Nov 18 '18 at 09:26
There is really not any preprocessing(except vectorizing sequences) because I am using already available default imdb dataset from keras. — MessitÖzil, Nov 18 '18 at 10:17
Ok. Could you explain or show the vectorizing step? I think I know what's going wrong but I want to make sure. — today, Nov 18 '18 at 10:33
This is the exact function I am using https://stackoverflow.com/questions/50213274/vectorize-sequences-explanation — MessitÖzil, Nov 18 '18 at 13:35

score 5 · Accepted Answer · answered Nov 18 '18 at 14:04

Don't forget that LSTM is used for processing sequences such as timeseries or text data. In a sequence the order of elements is very important and if you reorder the element then the whole meaning of that sequence might completely change.

Now the problem in your case is that the preprocessing step you have used is not the proper one for a LSTM model. You are encoding each sentence as a vector where each of its elements represents the presence or absence of particular word. Therefore, you are completely ignoring the order of appearance of words in a sentence, which LSTM layer is good at modeling it. There is also another issue in your LSTM model, considering the preprocessing scheme you have used, which is the fact that Embedding layer accepts word indices as input and not a vector of zero and ones (i.e. the output of the preprocessing stage).

Since the IMDB data is already stored as sequences of word indices, to resolve this issue you just need to preprocess the IMDB data by only padding/truncating the sequences with a specified length to be able to utilize batch processing. For example:

from keras.preprocessing.sequences import pad_sequences

vocab_size = 10000 # only consider the 10000 most frequent words
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=vocab_size)

x_train = pad_sequences(x_train, maxlen=500)  # truncate or pad sequences to make them all have a length of 500

Now, x_train would have a shape of (25000, 500) and it consists of 25000 sequences of length 500, encoded as integer word indices. Now you can use it for training by passing it to fit method. I guess you can reach at least 80% training accuracy with an Embedding layer and a single LSTM layer. Don't forget that to use a validation scheme to monitor overfitting (one simple option is to set validation_split argument when calling fit method).

Keras-- low accuracy with LSTM layer but the accuracy is good without LSTM

1 Answers1

Linked