1

I made model for text summarization with BLSTM architecture and global Attention, I have x_vocab_size = 36782 and y_vocab_size = 19749

For the full model i've made :

latent_dim = 300 

# Encoder 
encoder_inputs = Input(shape=(None,), dtype='int32', name='input_text') 
enc_emb = Embedding(x_vocab_size, latent_dim, name='text_embedding', trainable=True (encoder_inputs) 

# BLSTM Layer
encoder_LSTM = LSTM(latent_dim, return_sequences=True, return_state=True)
encoder_LSTM_R = LSTM(latent_dim, return_sequences=True, return_state=True, go_backwards=True)
encoder_output, forward_h, forward_c = encoder_LSTM(enc_emb)
encoder_outputr, backward_h, backward_c = encoder_LSTM_R(enc_emb)

encoder_outputs = Concatenate()([encoder_output, encoder_outputr])

encoder_states = [forward_h, forward_c, backward_h, backward_c]

# Decoder 
decoder_inputs = Input(shape=(None,), name='input_summary') 
dec_emb_layer = Embedding(y_vocab_size, latent_dim, name='summary_embedding', trainable=True)
dec_emb = dec_emb_layer(decoder_inputs) 

# LSTM using encoder_states as initial state
decoder_LSTM = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_LSTM_R = LSTM(latent_dim, return_sequences=True, return_state=True, go_backwards=True)
decoder_output, decforward_h, decforward_c = decoder_LSTM(dec_emb, initial_state=[forward_h, forward_c])
decoder_outputr, decbackward_h, decbackward_c = decoder_LSTM_R(dec_emb, initial_state=[backward_h, backward_c]) 

decoder_outputs = Concatenate()([decoder_output, decoder_outputr])

decoder_states = [decforward_h, decforward_c, decbackward_h, decbackward_c]

# Attention Layer
#attn_layer = AttentionLayer(name='attention_layer') 
#attn_out, attn_states = attn_layer([encoder_outputs, decoder_outputs]) 
attn_out = tf.keras.layers.Attention()([encoder_outputs, decoder_outputs]) 

# Concat attention output and decoder BLSTM output 
decoder_concat_input = Concatenate(axis=-1, name='dec_concat_layer')([decoder_outputs, attn_out])

# Dense layer
decoder_dense = TimeDistributed(Dense(y_vocab_size, activation='softmax')) 
decoder_outputs = decoder_dense(decoder_concat_input) 

# Model Definition
model = Model([encoder_inputs, decoder_inputs], [decoder_outputs]) 
model.summary()

And this the inference model :

# Encoder Inference
encoder_model = Model(inputs=encoder_inputs,outputs=encoder_states)

# Decoder Inference
# Below tensors hold the states of the previous time step
decoder_state_input_h = Input(shape=(None,300))
decoder_state_input_c = Input(shape=(None,300))

decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]

# Getting decoder sequence embeddings
dec_emb2 = dec_emb_layer(decoder_inputs)

# Predicting the next word in the sequence
# Setting the initial states to the previous time step states
decoder_outputs2, state_h2, state_c2 = decoder_LSTM(dec_emb2, initial_state=decoder_states_inputs)
decoder_outputsb2, state_hb2, state_cb2 = decoder_LSTM(dec_emb2, initial_state=decoder_states_inputs)

decoder_outputs3 = Concatenate()([decoder_outputs2, decoder_outputsb2])

# Attention Inference
#attn_out_inf, attn_states_inf = attn_layer([decoder_hidden_state_input, decoder_outputs2])
attn_out_inf = tf.keras.layers.Attention()([encoder_outputs, decoder_outputs3]) 
decoder_inf_concat = Concatenate(axis=-1, name='concat')([decoder_outputs3, attn_out_inf])

# Dense softmax layer to calculate probability distribution over target vocab
decoder_outputs2 = decoder_dense(decoder_inf_concat)

# Final Decoder model
decoder_model = Model(
[decoder_inputs]+[decoder_hidden_state_input],
[decoder_outputs2])

I got this error when i run the inference model :

ValueError: Exception encountered when calling layer "lstm_2" (type LSTM).

Dimensions must be equal, but are 1200 and 300 for '{{node mul}} = Mul[T=DT_FLOAT](Sigmoid_1, init_c)' with input shapes: [?,?,1200], [?,?,300].

Call arguments received by layer "lstm_2" (type LSTM):
  • inputs=['tf.Tensor(shape=(None, None, 300), dtype=float32)', 'tf.Tensor(shape=(None, None, 300), dtype=float32)', 'tf.Tensor(shape=(None, None, 300), dtype=float32)']
  • mask=None
  • training=False
  • initial_state=None

What is wrong with my model? and how i can fix and use it to generate predicted summary?

0 Answers0