Training in inference mode in seq-to-seq model

Question

This is apparently the code for seq2seq model with embedding that i wrote

    encoder_inputs = Input(shape=(MAX_LEN, ), dtype='int32',)
    encoder_embedding = embed_layer(encoder_inputs)
    encoder_LSTM = LSTM(HIDDEN_DIM, return_state=True)
    encoder_outputs, state_h, state_c = encoder_LSTM(encoder_embedding)
    encoder_states = [state_h, state_c]
    decoder_inputs = Input(shape=(MAX_LEN, ))
    decoder_embedding = embed_layer(decoder_inputs)
    decoder_LSTM = LSTM(HIDDEN_DIM, return_state=True, return_sequences=True)
    decoder_outputs, _, _ = decoder_LSTM(
        decoder_embedding, initial_state=encoder_states)
    outputs = TimeDistributed(
        Dense(VOCAB_SIZE, activation='softmax'))(decoder_outputs)
    model = Model([encoder_inputs, decoder_inputs], outputs)

    # defining inference model
    encoder_model = Model(encoder_inputs, encoder_states)
    decoder_state_input_h = Input(shape=(None,))
    decoder_state_input_c = Input(shape=(None,))
    decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
    decoder_outputs, state_h, state_c = decoder_LSTM(
        decoder_embedding, initial_state=decoder_states_inputs)
    decoder_states = [state_h, state_c]
    outputs = TimeDistributed(
        Dense(VOCAB_SIZE, activation='softmax'))(decoder_outputs)
    decoder_model = Model(
        [decoder_inputs] + decoder_states_inputs, [outputs] + decoder_states)
    return model, encoder_model, decoder_model

we are using inference mode for predictions particularly encoder and decoder model, but i am not sure where the training is happening for the encoder and decoder?

Edit 1

Code is build upon: https://keras.io/examples/lstm_seq2seq/, with added embedding layer and Time Distributed dense layer.
for more info on issue: github repo

you said 'apparently the code for seq2seq model with embedding that i wrote' - did you actually write it or copied from somewhere? please share references, your data and the context of the problem. — Ramsha Siddiqui, Feb 09 '20 at 09:33

score 1 · Accepted Answer · edited Feb 09 '20 at 11:53

1

Encoder and decoder are trained simultaneously, or more precisely the model that is composed of these two is trained which in turn trains both of them (this is not GAN where you need some fancy training cycle)

If you look closely in the provided link, there is a section where the model is trained.

# Run training
model.compile(optimizer='rmsprop', loss='categorical_crossentropy',
              metrics=['accuracy'])
model.fit([encoder_input_data, decoder_input_data], decoder_target_data,
          batch_size=batch_size,
          epochs=epochs,
          validation_split=0.2)

Edit: from comments

If you look more closely, the "new" model that you are defining after fit consists of layers that have already been trained in the previous step. i.e Model(encoder_inputs, encoder_states) both encoder_inputs and encoder_states were used during the initial training, you are just repackaging them.

edited Feb 09 '20 at 11:53

saransh bhatnagar

40
4

answered Feb 09 '20 at 10:03

Matus Dubrava

13,637
2
38
54

Yes, sure you are pointing out right but this is for training mode, but for inference mode we are creating a new encoder and decoder model ``` # Define sampling models encoder_model = Model(encoder_inputs, encoder_states) decoder_model = Model( [decoder_inputs] + decoder_states_inputs, [decoder_outputs] + decoder_states) ``` – saransh bhatnagar Feb 09 '20 at 10:08
I don't understand your question. You either train your model in the "training mode" or do the inference in the "inference" mode. You don't mix these 2 together. – Matus Dubrava Feb 09 '20 at 10:10
If you want to augment your already trained model with some additional layers that have trainable parameters then you need to retrain the model with those additional layers included. – Matus Dubrava Feb 09 '20 at 10:12
yes, got it, so when in inference mode we are not training(not updating weights)? is it right? – saransh bhatnagar Feb 09 '20 at 10:14
Yes that is right. But technically speaking there is no such thing as training or inference mode. What matters is what method you are calling. `fit` for training and `predict` for inference. If you are calling `predict`, no training is happening. – Matus Dubrava Feb 09 '20 at 10:16
just after fit we are initializing our encoder_model `encoder_model = Model(encoder_inputs, encoder_states)`, then how come the trainied weights are carried forward. I know I am missing on something, this may seem naive but i am still not able to understand how we are using untrained encoder and decoder for for prediciton – saransh bhatnagar Feb 09 '20 at 10:19
If you look more closely, the "new" model that you are defining after `fit` consists of layers that have already been trained in the previous step. i.e `Model(encoder_inputs, encoder_states)` both `encoder_inputs` and `encoder_states` were used during the initial training, you are just repackaging them. – Matus Dubrava Feb 09 '20 at 10:27

Training in inference mode in seq-to-seq model

Edit 1

1 Answers1

Edit: from comments