6

I am trying to build a stateful LSTM with Keras and I don't understand how to add a embedding layer before the LSTM runs. The problem seems to be the stateful flag. If my net is not stateful adding the embedding layer is quite straight forward and works.

A working stateful LSTM without embedding layer looks at the moment like this:

model = Sequential()
model.add(LSTM(EMBEDDING_DIM,
               batch_input_shape=(batchSize, longest_sequence, 1),
               return_sequences=True,
               stateful=True))
model.add(TimeDistributed(Dense(maximal_value)))
model.add(Activation('softmax'))
model.compile(...)

When adding the Embedding layer I move the batch_input_shape parameter into the Embedding layer i.e. only the first layer needs to known the shape? Like this:

model = Sequential()
model.add(Embedding(vocabSize+1, EMBEDDING_DIM,batch_input_shape=(batchSize, longest_sequence, 1),))
model.add(LSTM(EMBEDDING_DIM,
               return_sequences=True,
               stateful=True))
model.add(TimeDistributed(Dense(maximal_value)))
model.add(Activation('softmax'))
model.compile(...)

The exception I get know is Exception: Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=4

So I am stuck here at the moment. What is the trick to combine word embeddings into a stateful LSTM?

toobee
  • 2,592
  • 4
  • 26
  • 35

1 Answers1

5

The batch_input_shape parameter of the Embedding layer should be (batch_size, time_steps), where time_steps is the length of the unrolled LSTM / number of cells and batch_size is the number of examples in a batch.

model = Sequential()
model.add(Embedding(
   input_dim=input_dim, # e.g, 10 if you have 10 words in your vocabulary
   output_dim=embedding_size, # size of the embedded vectors
   input_length=time_steps,
   batch_input_shape=(batch_size,time_steps)
))
model.add(LSTM(
   10, 
   batch_input_shape=(batch_size,time_steps,embedding_size),
   return_sequences=False, 
   stateful=True)
)

There is an excellent blog post which explains stateful LSTMs in Keras. Also, I've uploaded a gist which contains a simple example of a stateful LSTM with Embedding layer.

Stefan
  • 1,029
  • 9
  • 21
  • How do you decide the embedding_size or find out the size of the embedded vectors? – naisanza Sep 05 '17 at 05:22
  • @naisanza The embedding_size is a hyper parameter. This means that the embedding_size depends on your problem, and you are free to choose it. Unfortunately, I cannot really give you a general answer on how to choose good hyperparameters, but https://arxiv.org/pdf/1206.5533.pdf provides a good start on that topic. – Stefan Sep 11 '17 at 07:58