I have built a tensorflow keras encoder decoder seq2seq LSTM model. It's purpose is to predict answers to sentences, essentially a chatbot.
I have successfully created the model, along with the inference models and managed to train it and generate responses with it with no problems. However, when trying to add attention to it to improve its performance I could not get it to work. The type of attention I would rather add is Bahdanau Attention, If possible.
I managed to understand how to add attention to the seq2seq model, but couldn't add it to the inference models, so this is my main problem.
For your information: The sentences I use as training data have a maximum length of 20 words. However, because the dataset is large I use subword tokenization so the max question and answer sizes are 495 and 176 tokens respectively, with the vocab size being 50000.
The code shown is the one that can train with no problems, but the inference models creation does not work.
Here is where I define the model:
VOCAB_SIZE = len( tokenizer.get_vocab() )+1
hidden_size = 512
embedding_dim = 200
max_question_len = 495
max_answer_len = 176
encoder_inputs = tf.keras.layers.Input(shape=( max_question_len , ))
encoder_embedding = tf.keras.layers.Embedding( VOCAB_SIZE, embedding_dim , mask_zero=True ) (encoder_inputs)
encoder_outputs , state_h , state_c = tf.keras.layers.LSTM( hidden_size , return_state=True )( encoder_embedding )
encoder_states = [ state_h , state_c ]
decoder_inputs = tf.keras.layers.Input(shape=( max_answer_len , ))
decoder_embedding = tf.keras.layers.Embedding( VOCAB_SIZE, embedding_dim , mask_zero=True) (decoder_inputs)
decoder_lstm = tf.keras.layers.LSTM( hidden_size , return_state=True, return_sequences=True )
decoder_outputs , _ , _ = decoder_lstm ( decoder_embedding , initial_state=encoder_states )
attention = tf.keras.layers.AdditiveAttention()
context_vector = attention([decoder_outputs, encoder_outputs])
concatenated_outputs = tf.keras.layers.Concatenate(axis=-1)([context_vector, decoder_outputs])
decoder_dense = tf.keras.layers.Dense( VOCAB_SIZE , activation=tf.keras.activations.softmax )
output = decoder_dense ( concatenated_outputs )
model = tf.keras.models.Model([encoder_inputs, decoder_inputs], output )
model.compile(optimizer=tf.keras.optimizers.Adam(clipvalue=1), loss='sparse_categorical_crossentropy')
print("Model compiled succesfully!")
model.summary()
And here is where I create my inference models:
def make_inference_models():
encoder_model = tf.keras.models.Model(encoder_inputs, encoder_states)
decoder_state_input_h = tf.keras.layers.Input(shape=( hidden_size ,))
decoder_state_input_c = tf.keras.layers.Input(shape=( hidden_size ,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(decoder_embedding , initial_state=decoder_states_inputs)
attention = tf.keras.layers.AdditiveAttention()
context_vector = attention([decoder_outputs, encoder_outputs])
concatenated_outputs = tf.keras.layers.Concatenate(axis=-1)([context_vector, decoder_outputs])
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(concatenated_outputs)
decoder_model = tf.keras.models.Model(
[decoder_inputs] + decoder_states_inputs,
[decoder_outputs] + decoder_states)
return encoder_model , decoder_model
When creating the inference models I get the error:
Graph disconnected: cannot obtain value for tensor KerasTensor(type_spec=TensorSpec(shape=(None, 495), dtype=tf.float32, name='input_1'), name='input_1', description="created by layer 'input_1'") at layer "embedding". The following previous layers were accessed without issue: []
in the line:
decoder_model = tf.keras.models.Model([decoder_inputs] + decoder_states_inputs, [decoder_outputs] + decoder_states)
Any help will be much appreciated!