0

I've been unable to figure out the dimensions for an RNN encoder-decoder architecture. I understand how LSTMs work, but I am struggling to implement this one in Keras. After looking at the documentation and reading Q&As, it looks like the dimensions for the network output have to match the dimensions of the entire set of targets (instead of a specific target -- this makes no sense). I'm sure that I've read this wrong and it instead needs to fit the dimensions of just the target for the given xi (setting aside questions of batches for now). After several hours of fiddling, I'm more confused. I think the fact that I am embedding the inputs to the RNN and not embedding the outputs may have something to do with it, and I may need to flatten the network somewhere along the way.

Here's the setup:

  • The dataset is a large number of Q&A pairs. I am working with a sample of 1440 pairs to build out the infrastructure.
    • xi:"what is the capital of the US?"
    • yi: "I think the capital is Washington"
  • After NLP, there are two numpy arrays -- one for X and one for Y. Each row corresponds to a row in the original dataset, e.g.:
    • Processed xi: [253, 8, 25, 208, 28, 1]
    • Processed yi: [827, 10, 25, 208, 8, 198]
  • There is an embedding layer for the input sequences(using the glove algorithm), but I don't think it's necessary for the output sequences.

Here is the code:

model = Sequential()
model.add(Embedding(vocabulary_size, embed_size, input_length = maxlen, weights=[embedding_matrix]))
model.add(Bidirectional(LSTM(embed_size, return_sequences=True)))
model.add(LSTM(embed_size, return_sequences=True))

if dropout < 1.0:
    model.add(Dropout(dropout))

model.add(TimeDistributed(Dense(embed_size, activation='softmax')))
# model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())

model.fit(X_itrain, y_train, batch_size=32, epochs=1) 

Here is the network summary:

Layer (type)                 Output Shape              Param #   
embedding_29 (Embedding)     (None, 95, 100)           404600    
bidirectional_12 (Bidirectio (None, 95, 200)           160800    
lstm_45 (LSTM)               (None, 95, 100)           120400    
time_distributed_18 (TimeDis (None, 95, 100)           10100     
Total params: 695,900 Trainable params: 695,900 Non-trainable params:

Here is the error:

ValueError: Error when checking target: expected time_distributed_18 to have 3 dimensions, but got array with shape (1440, 95)

Other details:

  • maxlen: the maximum length of the input and output sequences is 95
  • embed_size: the dimensionality of the word embedding is 100
  • vocabulary_size: the size of the vocabulary is 4046
Jake
  • 61
  • 1
  • 4

1 Answers1

0

One problem that you have is that you are currently not building a encoder-decoder model. Currently you try to train a model, that gets a question and than immediately answers it. For an encoder decoder model, you would need to build two models. The first model has to map the input to an encoded state. And the second model should then learn to take this encoded state, get the already answered part and give you back the next word.

You can find example Keras code here.

Syrius
  • 941
  • 6
  • 22