Using Keras to structure LSTM model

Question

I'm experimenting with Keras and I am trying to create both a regular neural network and an LSTM neural network, each having one input layer (2000 inputs), one hidden layer (256 nodes), and one output layer (1 node). Trying to follow guides in Keras documentation, this is how I've done it:

Regular neural network:

model = Sequential()

model.add(Dense(2000, input_shape = (2000,), activation = 'sigmoid'))

model.add(Dense(256, activation = 'sigmoid'))

model.add(Dense(1, activation = 'sigmoid'))

Long short-term memory:

model = Sequential() 

model.add(Embedding(2000, 256))

model.add(LSTM(256, activation = 'tanh', dropout = 0.2, recurrent_dropout = 0.2))

model.add(Dense(1, activation = 'sigmoid'))

As you can see, for the LSTM network I've used an Embedding layer as the input layer. Is this possible to avoid? I don't quite understand why one would want to use an embedding layer from Reading the Keras documentation, but that's the only way I could get the LSTM network working..

However, the final test accuracy of these two network models differ quite much even though the exact same data is used in evaluation. As an example, the LSTM gives around 60% accuracy, while the regular net gets about 90%.

Is this due to the use of different types of layers, and can I use a dense layer as input layer even though I have an LSTM layer next?

Currently, when I try using a dense layer Before the LSTM layer, I get the error:

ValueError: Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=2

This is what I tried:

model = Sequential() 

model.add(Dense(2000, input_shape = (2000,), activation = 'sigmoid'))

model.add(LSTM(256, activation = 'tanh', dropout = 0.2, recurrent_dropout = 0.2))

model.add(Dense(1, activation = 'sigmoid'))

What I really would like to achieve is one model that is a very simple regular neural network (non-recurrent), and one model that is a pure LSTM neural network. One input layer, one hidden layer, and one output layer. Both models should have the same number of nodes.

The error message says that the LSTM users a 3 dimensional input. Something with the shape like (x,y,z). While the output you're taking from the previous layer is only bidimensional (x,y). You can probably find more details here: http://stackoverflow.com/questions/39674713/neural-network-lstm-keras?rq=1 — Daniel Möller, Apr 29 '17 at 02:18
Thanks @Daniel , but do you know why it works with an embedding as first layer even though the accuracy generally seems to be worse using this structure? — Stephen Johnson, Apr 29 '17 at 15:08
Because the embedding is ouputting the right dimensions. You'd probably be ok simply adding a `Reshape` layer or a time distribution. — Daniel Möller, Apr 30 '17 at 23:17
@convolutionBoy yes (I think?), the data is originally sentences. They are converted into numerical feature representations and the goal is to classify them into say A or B. Like sentiment analysis. — Stephen Johnson, May 02 '17 at 10:45
@Daniel thank's I will try that. I will answer the question if it worked! — Stephen Johnson, May 02 '17 at 10:45
Have you figured out why the LSTM network worked worst than the normal neural network, even if it should have better fit the domain assumptions? I have a similar problem [here](https://stackoverflow.com/questions/54925207/lstm-gru-autoencoder-convergency) — Guido, Mar 04 '19 at 23:32

score 1 · Answer 1 · answered Apr 13 '18 at 06:12

I happened to be in the same situation where I already had tensors and did not want to use embedding layer before LSTM. I used @Daniel Moller's suggestion and used a reshape layer. Here is how my working model looks like:

from keras.layers import LSTM, Dense
from keras.models import Sequential
model_ls = Sequential()
model_ls.add(Reshape((3,2500), input_shape = (50,50,3) ))
#LSTM requires three dimensions (batch, time_series info, samples)
model_ls.add(LSTM(128, return_sequences=True)) 
model_ls.add(LSTM(64))
model_ls.add(Dense(40, activation="relu", name="feat_x"))
model_ls.add(Dense(1, activation="tanh"))
model_ls.compile(optimizer="adadelta",loss= "binary_crossentropy", metrics=["acc"])
model_ls.summary()

Here is how the output looked like:

Train on 201433 samples, validate on 67145 samples
Epoch 1/15
 - 23s - loss: 0.0252 - acc: 0.9981 - val_loss: 0.0260 - val_acc: 0.9980
Epoch 2/15
 - 23s - loss: 0.0252 - acc: 0.9981 - val_loss: 0.0260 - val_acc: 0.9980
........
Epoch 15/15
 - 23s - loss: 0.0250 - acc: 0.9981 - val_loss: 0.0257 - val_acc: 0.9980

Using Keras to structure LSTM model

1 Answers1