Adding attention layer to the Encoder-Decoder model architecture gives worse results

Question

I initially defined a Encoder-Decoder Model architecture for Next Phrase Prediction and trained it on some data, I was successfully able to predict using the same model. But when I tried to insert an Attention layer in the architecture the model training was successful but I was not able to define encoder and decoder models separately for predictions. Here is the new Model Architecture defined by me:

# Model architecture along with Attention Layer
# Create the Encoder layers first.
encoder_inputs = Input(shape=(len_input,))
encoder_emb = Embedding(input_dim=vocab_in_size, output_dim=embedding_dim)

# Bidirectional LSTM or Simple LSTM
encoder_lstm = Bidirectional(LSTM(units=units, return_sequences=True, return_state=True)) # Bidirectional(
encoder_out, fstate_h, fstate_c, bstate_h, bstate_c = encoder_lstm(encoder_emb(encoder_inputs))
state_h = Concatenate()([fstate_h,bstate_h])
state_c = Concatenate()([bstate_h,bstate_c])

encoder_states = [state_h, state_c]

# Now create the Decoder layers.
decoder_inputs = Input(shape=(None,))
decoder_emb = Embedding(input_dim=vocab_out_size, output_dim=embedding_dim)
decoder_lstm = LSTM(units=units*2, return_sequences=True, return_state=True) # units=units*2
decoder_lstm_out, _, _ = decoder_lstm(decoder_emb(decoder_inputs), initial_state=encoder_states)

# Attention layer
attn_layer = AttentionLayer(name='attention_layer')
attn_out, attn_states = attn_layer([encoder_out, decoder_lstm_out])

# Concat attention input and decoder LSTM output
decoder_concat_input = Concatenate(axis=-1, name='concat_layer')([decoder_lstm_out, attn_out])

# Two dense layers
decoder_d1 = TimeDistributed(Dense(units, activation="relu"))
decoder_d2 = TimeDistributed(Dense(vocab_out_size, activation="softmax"))
decoder_out = decoder_d2(Dropout(rate=.2)(decoder_d1(Dropout(rate=.2)(decoder_concat_input))))
#decoder_out = decoder_d2(Dropout(rate=.2)(decoder_concat_input))

# combining the encoder and the decoder layers together
model = Model(inputs = [encoder_inputs, decoder_inputs], outputs= decoder_out)

model.compile(optimizer=tf.optimizers.Adam(), loss="sparse_categorical_crossentropy", metrics=['sparse_categorical_accuracy'])
model.summary()

Trained this model and defined another encoder and decoder using the same tensors:

# Changed infmodel
# Create the encoder model from the tensors we previously declared, while training
encoder_model = Model(encoder_inputs, [encoder_out, state_h, state_c], name = 'Encoder')

# decoder model
# Generate a new set of tensors for our new inference decoder
state_input_h = Input(shape=(units*2,), name="state_input_h") # units*2 if Bidirectional LSTM else units*1
state_input_c = Input(shape=(units*2,), name="state_input_c") # units*2
inf_decoder_inputs = Input(shape=(len_input, units), name="inf_decoder_inputs")
# similar decoder model architecture with state from encoder model
decoder_res, decoder_h, decoder_c = decoder_lstm(decoder_emb(decoder_inputs),
                                                 initial_state=[state_input_h, state_input_c])

# Attention inference
attn_out_res, attn_states_res = attn_layer([inf_decoder_inputs, decoder_res])
# Concat attention input and decoder LSTM output
decoder_out_concat_res = Concatenate(axis=-1, name='concat_layer')([decoder_res, attn_out_res])

inf_decoder_out = decoder_d2(decoder_d1(decoder_out_concat_res))

# finalizing the deocder model
inf_model = Model(inputs=[decoder_inputs] + [inf_decoder_inputs,  state_input_h, state_input_c], 
                  outputs=[inf_decoder_out, decoder_h, decoder_c], name = 'Decoder')

The results after model training have become worse, I believe I have some problems with model architecture. I have got to this architecture after trying many permutations. Go through the Model architecture once.

Were you able to fix the issue? I too have a similar problem — user_12, Nov 30 '19 at 11:40
@user_12 This current issue was fixed but it gave rise to another issue, and I was able to add Attention Layer to the current model architecture. Please let me know if you find any progress in this. — Mousam Singh, Dec 03 '19 at 13:47
I was able to solve it by correctly checking each layer is connected in the correct way. If you can share the other issue maybe i can assist you. — user_12, Dec 03 '19 at 21:33
Please go through my question above once more, I have made the required changes. @user_12 — Mousam Singh, Dec 04 '19 at 10:00

Adding attention layer to the Encoder-Decoder model architecture gives worse results

0 Answers0