I'm trying to build a seq2seq architecture chatbot with LSTM models (for teaching purposes). To make the tensors homogeneous, I supplemented short phrases with special tokens padding_token (which, in my case, is equal to 0). If the network is trained on data that is supplemented with padding_token tokens, then it starts to work poorly on data without a token. So I decided to use a Masking layer to skip tokens equal to padding_token. But, for some reason, the Masking layer does not work - its presence did not affect the result in any way.
I posted the code of the entire project on github.
This is how the model initialization code looks like:
#encoder
#####################################################################
encoder_input = layers.Input(shape=1, name="encoder_input")
encoder_masking = layers.Masking(mask_value=padding_token,
input_shape=(None, 1),
name="encoder_masking")(encoder_input)
encoder_embedding = layers.Embedding(input_dim=count_tokens,
output_dim=dim_embedding,
input_shape=(None,),
name="encoder_embedding")(encoder_masking)
encoder_lstm_1, encoder_state_h_1, encoder_state_c_1
= layers.LSTM(units=dim_lstm_1,
input_shape=(None,),
return_sequences=True,
return_state=True,
name="encoder_lstm_1")(encoder_embedding)
encoder_lstm_2, encoder_state_h_2, encoder_state_c_2
= layers.LSTM(units=dim_lstm_2,
input_shape=(None,),
return_sequences=True,
return_state=True,
name="encoder_lstm_2")(encoder_lstm_1)
encoder_states_1 = [encoder_state_h_1, encoder_state_c_1]
encoder_states_2 = [encoder_state_h_2, encoder_state_c_2]
#decoder
#####################################################################
decoder_input = layers.Input(shape=1, name="decoder_input")
decoder_embedding = layers.Embedding(input_dim=count_tokens,
output_dim=dim_embedding,
input_shape=(None,),
name="decoder_embedding")(decoder_input)
decoder_lstm_1, _, _
= layers.LSTM(units=dim_lstm_1,
input_shape=(None,),
return_sequences=True,
return_state=True,
name="decoder_lstm_1)(decoder_embedding,
initial_state=encoder_states_1)
decoder_lstm_2, _, _
= layers.LSTM(units=dim_lstm_2,
input_shape=(None,),
return_sequences=True,
return_state=True,
name="decoder_lstm_2")(decoder_lstm_1,
initial_state=encoder_states_2)
decoder_dense = layers.Dense(units=dim_dense,
activation="sigmoid",
name="decoder_dense")(decoder_lstm_2)
#####################################################################
encoder_decoder_model = keras.Model(inputs=[encoder_input, decoder_input], outputs=decoder_dense)
I used this model purely for training, then I divided it into two separate encoder and decoder models.
P.S. If I use the mask_zero=True parameter in the Embedding layer, then the mask starts to really work (this shows the correctness of the code, to check the work of the Masking layer). Although it works for me in this case, nevertheless, I would really like to know why the Masking layer does not work, because the mask_zero parameter allows you to hide only zeros.
Please help me, I will be very grateful :)