6

I am trying to implement a sequence 2 sequence model with attention using the Keras library. The block diagram of the model is as follows

Enter image description here

The model embeds the input sequence into 3D tensors. Then a bidirectional lstm creates the encoding layer. Next the encoded sequences are sent to a custom attention layer that returns a 2D tensor having attention weights for each hidden node.

The decoder input is injected in the model as one hot vector. Now in the decoder (another bi-lstm) both decoder input and the attention weight are passed as input. The output of the decoder is sent to time distributed dense layer with the softmax activation function to get the output for every time step in the means of probability. The code of the model is as follows:

encoder_input = Input(shape=(MAX_LENGTH_Input, ))

embedded = Embedding(input_dim=vocab_size_input, output_dim= embedding_width, trainable=False)(encoder_input)

encoder = Bidirectional(LSTM(units= hidden_size, input_shape=(MAX_LENGTH_Input,embedding_width), return_sequences=True, dropout=0.25, recurrent_dropout=0.25))(embedded)

attention = Attention(MAX_LENGTH_Input)(encoder)

decoder_input = Input(shape=(MAX_LENGTH_Output,vocab_size_output))

merge = concatenate([attention, decoder_input])

decoder = Bidirectional(LSTM(units=hidden_size, input_shape=(MAX_LENGTH_Output,vocab_size_output))(merge))

output = TimeDistributed(Dense(MAX_LENGTH_Output, activation="softmax"))(decoder)

The problem is when I am concatenating the attention layer and decoder input. Since the decoder input is a 3D tensor whereas attention is a 2D tensor, it's showing the following error:

ValueError: A Concatenate layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 1024), (None, 10, 8281)]

How can I convert a 2D Attention tensor into a 3D tensor?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131

1 Answers1

5

Based on your block diagram it looks like you pass the same attention vector at every timestep to the decoder. In that case you need to RepeatVector to copy the same attention vector at every timestep to convert a 2D attention tensor into a 3D tensor:

# ...
attention = Attention(MAX_LENGTH_Input)(encoder)
attention = RepeatVector(MAX_LENGTH_Output)(attention) # (?, 10, 1024)
decoder_input = Input(shape=(MAX_LENGTH_Output,vocab_size_output))
merge = concatenate([attention, decoder_input]) # (?, 10, 1024+8281)
# ...

Take note that this will repeat the same attention vector for every timestep.

nuric
  • 11,027
  • 3
  • 27
  • 42
  • Thank you very much for the solution. It worked perfectly. But i have a query. I was supposed to get the output in the shape of (?,max_len,vocab_size) right? but i am getting in the shape of (?,max_len,max_len). Where did i made mistake? – C M Khaled Saifullah Jul 25 '18 at 20:43
  • 1
    I got it. I made a mistake while building time distributed dense layer. it will be: output = TimeDistributed(Dense(vocab_size_output, activation="softmax"))(decoder) – C M Khaled Saifullah Jul 25 '18 at 21:04
  • Where did you get "Attention"? – James Chang Jul 30 '18 at 16:11
  • I tried Raffel et al. [https://arxiv.org/abs/1512.08756]'s attention.py, but I got a value error: ValueError: ('Could not interpret regularizer identifier:', 15). Did you use the same attention.py and have you had the same error? – James Chang Jul 30 '18 at 16:54
  • @JamesChang Here is the link of the code that worked for me: [link] https://drive.google.com/open?id=1Ou_IaaGzoHZENhFfZZNhKvgy0Vym7JS7 – C M Khaled Saifullah Aug 10 '18 at 22:11
  • @CMKhaledSaifullah Thanks, I'll give it a try! – James Chang Aug 11 '18 at 15:41
  • I just replaced (input_shape[-1],) with shape=(input_shape[-1],) in all the *self.add_weight()* function in Attention class... And it seems that *return_sequences=True* should be added to *decoder = layers.Bidirectional(layers.LSTM(units=hidden_size, return_sequences=True), input_shape=(MAX_LENGTH_Output,vocab_size_output))(merge)* -- such works for me. Thanks for the class with *mask* – JeeyCi May 23 '22 at 10:43
  • BTW, TimeDistributedDense is already deprecated. Since Keras 2.0, Dense can handle >2-dimensional tensor well – JeeyCi May 23 '22 at 15:46