1

I am trying the encoder-decoder model for language translation, but the val_acc is fluctuating, and not going beyond 16%. So, I decided to add Dropout to avoid overfitting, but I am not able to do so.

Please help me in adding dropout in my code as shown below:

# Encoder
encoder_inputs = Input(shape=(None,))
enc_emb =  Embedding(num_encoder_tokens +1, latent_dim, mask_zero = True)(encoder_inputs)
encoder_lstm = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(enc_emb)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]


# Decoder
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None,))
dec_emb_layer = Embedding(num_decoder_tokens +1, latent_dim, mask_zero = True)
dec_emb = dec_emb_layer(decoder_inputs)
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(dec_emb,
                                     initial_state=encoder_states)

decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
desertnaut
  • 57,590
  • 26
  • 140
  • 166
smitshah99
  • 11
  • 3

1 Answers1

0

What is the training accuracy? I am assuming that your training accuracy is in high orders (>80%) because you saying that model is overfitting.

Now if that is the case, i.e. model is really overfitting, you can add dropout at multiple levels,

  • Pre-dense layer
decoder_outputs, _, _ = decoder_lstm(dec_emb,
                                     initial_state=encoder_states)

dropout = Dropout(rate=0.5)
decoder_outputs = dropout(decoder_outputs)

decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

To choose where to add dropout you need to find out why is your model overfitting. Are there less number of training samples? Is vocabulary size too small? Is model learning constant behaviors for all inputs?

Hope this helps. All the best.

mudit
  • 41
  • 4