1

I used keras to train a seq2seq model (keras.models.Model). The X and y to the model are [X_encoder, X_decoder] and y i.e. a list of encoder and decoder inputs and labels (Note that the decoder input, X_decoder is ‘y’ with one position ahead than the actual y. Basically, teacher forcing).

So my question is now after training, when it comes to actual prediction where I do not have any labels how do I provide ‘X_decoder’ to my input? Or do I train on something else?

This is a snippet of the model definition if at all that helps:)

# Encoder
encoder_inputs = Input(batch_shape=(batch_size, max_len,), dtype='int32')
encoder_embedding = embedding_layer(encoder_inputs)
encoder_LSTM = CuDNNLSTM(hidden_dim, return_state=True, stateful=True)
encoder_outputs, state_h, state_c = encoder_LSTM(encoder_embedding)

# Decoder
decoder_inputs = Input(shape=(max_len,), dtype='int32')
decoder_embedding = embedding_layer(decoder_inputs)
decoder_LSTM = CuDNNLSTM(hidden_dim, return_state=True, return_sequences=True)
decoder_outputs, _, _ = decoder_LSTM(decoder_embedding, initial_state=[state_h, state_c])

# Output
outputs = TimeDistributed(Dense(vocab_size, activation='softmax'))(decoder_outputs)
model = Model([encoder_inputs, decoder_inputs], outputs)

# model fitting:
model.fit([X_encoder, X_decoder], y, steps_per_epoch=int(number_of_train_samples/batch_size),
epochs=epochs)
nightfury
  • 74
  • 1
  • 7
  • 1
    A very common approach is to get the model to generate a sample of sequences by just giving some noise to your decoder for a given `encoder input`. Select the most correct sequence from this sample, make some edits and then train the model with this sequence as `decoder input` – ashutosh singh Jun 27 '19 at 17:10
  • Okay I kind of followed what you said but i have a few more questions. So selecting the most correct sequence from the noise-decoder-input is manually right? Also training the model with this decoder will be on top of the already trained model right? Also if I am anyway selecting a random noise and re-training the model on that to get better results, how is this better than ‘not’ using teacher forcing? Like this is almost similar to what exactly happens in normal RNNs where you put the predicted outputs with the next inputs – nightfury Jun 27 '19 at 18:43
  • 1
    Yes it is manual; you could use some automated approaches is possible. You have to generate complete sequence, see where the errors are correct them and feed the correct sequence back into the model. I is not much different than teacher forcing. The difference is that here your decoder will be running in inference mode; you will pass on the hidden states of encoder and some noise to the decoder and it will try to predict the output sequence. You will use best copies(or their variants) produced by your decoder(s) to finetune the system further. – ashutosh singh Jun 27 '19 at 19:18

1 Answers1

5

Usually, when you train a seq2seq model, the first token of decoder_inputs is a special <start> token. So when you try to generate a sentence, you do it like

first_token = decoder(encoder_state, [<start>])
second_token = decoder(encoder_state, [<start>, first_token])
third_token = decoder(encoder_state, [<start>, first_token, second_token])
...

You execute this recursion, until your decoder generates another special token - <end>; then you stop.

Here is a very crude pythonic decoder for your model. It is inefficient, because it reads the input over and over again, instead of memorizing the RNN state - but it works.

input_seq = # some array of token indices
result = np.array([[START_TOKEN]])
next_token = -1
for i in range(100500):
    next_token = model.predict([input_seq, result])[0][-1].argmax()
    if next_token == END_TOKEN:
        break
    result = np.concatenate([result, [[next_token]]], axis=1)
output_seq = result[0][1:] # omit the first INPUT_TOKEN

A more efficient solution would output the RNN state along with each token and use it to produce the next token.

David Dale
  • 10,958
  • 44
  • 73