2

I am trying to implement a bidirectional LSTM for text summarization. I have issue with the inference section. The dimension does not match. This is my model:

latent_dim = 300
embedding_dim=100

# Encoder
encoder_inputs = Input(shape=(max_news_len,))

#embedding layer
enc_emb =  Embedding(x_voc, embedding_dim,trainable=True)(encoder_inputs)

#encoder lstm 1
encoder_bi_lstm1 = Bidirectional(LSTM(latent_dim,
                                   return_sequences=True,
                                   return_state=True,
                                   dropout=0.4,
                                   recurrent_dropout=0.4), 
                                 merge_mode="concat")
encoder_output1, forward_state_h1, forward_state_c1, backward_state_h1, backward_state_c1 = encoder_bi_lstm1(enc_emb)
encoder_states1 = [forward_state_h1, forward_state_c1, backward_state_h1, backward_state_c1]

# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None,))

#embedding layer
dec_emb_layer = Embedding(y_voc, embedding_dim,trainable=True)
dec_emb = dec_emb_layer(decoder_inputs)

#decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True,dropout=0.4,recurrent_dropout=0.2)
#decoder_outputs,decoder_fwd_state, decoder_back_state = decoder_lstm(dec_emb,initial_state=[state_h, state_c])

decoder_bi_lstm = Bidirectional(LSTM(latent_dim, 
                                  return_sequences=True, 
                                  return_state=True,
                                  dropout=0.4,
                                  recurrent_dropout=0.2),
                             merge_mode="concat")
decoder_outputs, decoder_fwd_state_h1, decoder_fwd_state_c1, decoder_back_state_h1, decoder_back_state_c1 = decoder_bi_lstm(dec_emb,initial_state=encoder_states1)
decoder_states = [decoder_fwd_state_h1, decoder_fwd_state_c1, decoder_back_state_h1, decoder_back_state_c1]

# Attention layer
attn_layer = AttentionLayer(name='attention_layer')
attn_out, attn_states = attn_layer([encoder_output1, decoder_outputs])

# Concat attention input and decoder LSTM output
decoder_concat_input = Concatenate(axis=-1, name='concat_layer')([decoder_outputs, attn_out])

#dense layer
decoder_dense =  TimeDistributed(Dense(y_voc, activation='softmax'))
decoder_outputs = decoder_dense(decoder_concat_input)

# Define the model 
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

model.summary() 

This is my inference setup:

# Encode the input sequence to get the feature vector
encoder_model = Model(inputs=encoder_inputs,outputs=encoder_states1)

# Decoder setup
# Below tensors will hold the states of the previous time step
decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_hidden_state_input = Input(shape=(max_news_len,latent_dim))

# Get the embeddings of the decoder sequence
dec_emb2= dec_emb_layer(decoder_inputs) 
# To predict the next word in the sequence, set the initial states to the states from the previous time step
decoder_outputs2, decoder_fwd_state_h2, decoder_fwd_state_c2, decoder_back_state_h2, decoder_back_state_c2 = decoder_bi_lstm(dec_emb2, initial_state=decoder_states)
decoder_states2 = [decoder_fwd_state_h2, decoder_fwd_state_c2, decoder_back_state_h2, decoder_back_state_c2]

#attention inference
attn_out_inf, attn_states_inf = attn_layer([decoder_hidden_state_input, decoder_outputs2])
decoder_inf_concat = Concatenate(axis=-1, name='concat')([decoder_outputs2, attn_out_inf])

# A dense softmax layer to generate prob dist. over the target vocabulary
decoder_outputs2 = decoder_dense(decoder_inf_concat) 

# Final decoder model
decoder_model = Model(
    [decoder_inputs] + [decoder_hidden_state_input,decoder_state_input_h, decoder_state_input_c],
    [decoder_outputs2] + [decoder_fwd_state_h2, decoder_fwd_state_c2, decoder_back_state_h2, decoder_back_state_c2])

The error is: Dimensions must be equal, but are 300 and 600 for 'attention_layer_6/MatMul' (op: 'MatMul') with input shapes: [?,300], [600,600].

tsann
  • 89
  • 4

1 Answers1

2

I know I am late but just now I found an answer to the problem. I am also using the same architecture with Bi - Directional LSTM's with an encoder and decoder architecture.

While passing the decoder in inference mode, you need to create four separate tensors for the four initial states that are passed from the encoder to the decoder initially.

dec_h_state_f = tf.keras.layers.Input(shape=(latent_dim))
dec_h_state_r = tf.keras.layers.Input(shape=(latent_dim))

dec_c_state_f = tf.keras.layers.Input(shape=(latent_dim))
dec_c_state_r = tf.keras.layers.Input(shape=(latent_dim))


# Create the hidden input layer with twice the latent dimension,
# since we are using bi - directional LSTM's we will get 
# two hidden states and two cell states

dec_hidden_inp = tf.keras.layers.Input(shape=(max_news_len, latent_dim * 2))

Then finally you can build your model in the below fashion which works fine for me.

dec_model = tf.keras.models.Model([dec_input] + [dec_hidden_inp, dec_h_state_f, dec_h_state_r, dec_c_state_f, dec_c_state_r],
                              [dec_out_infer] + [dec_states])

And yes don't forget to separate these four values from the encoder while inference before feeding it initially into the decoder !!!

Harish L
  • 113
  • 1
  • 7