How to add Dropout in Encoder-Decoder Seq2Seq model

Question

I am trying the encoder-decoder model for language translation, but the val_acc is fluctuating, and not going beyond 16%. So, I decided to add Dropout to avoid overfitting, but I am not able to do so.

Please help me in adding dropout in my code as shown below:

# Encoder
encoder_inputs = Input(shape=(None,))
enc_emb =  Embedding(num_encoder_tokens +1, latent_dim, mask_zero = True)(encoder_inputs)
encoder_lstm = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(enc_emb)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]


# Decoder
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None,))
dec_emb_layer = Embedding(num_decoder_tokens +1, latent_dim, mask_zero = True)
dec_emb = dec_emb_layer(decoder_inputs)
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(dec_emb,
                                     initial_state=encoder_states)

decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

score 0 · Answer 1 · answered Mar 11 '21 at 14:15

What is the training accuracy? I am assuming that your training accuracy is in high orders (>80%) because you saying that model is overfitting.

Now if that is the case, i.e. model is really overfitting, you can add dropout at multiple levels,

Pre-dense layer

decoder_outputs, _, _ = decoder_lstm(dec_emb,
                                     initial_state=encoder_states)

dropout = Dropout(rate=0.5)
decoder_outputs = dropout(decoder_outputs)

decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

Dropout in Encoder an Decoder LSTM. Check dropout and recurrent_dropout arguments in https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM
Dropout at embedding layer

To choose where to add dropout you need to find out why is your model overfitting. Are there less number of training samples? Is vocabulary size too small? Is model learning constant behaviors for all inputs?

Hope this helps. All the best.

How to add Dropout in Encoder-Decoder Seq2Seq model

1 Answers1