1

I'm trying to implement a simple word-level sequence-to-sequence model with Keras in Colab. I'm using the Keras Attention layer. Here is the definition of the model:

embedding_size=200
UNITS=128

encoder_inputs = Input(shape=(None,), name="encoder_inputs")

encoder_embs=Embedding(num_encoder_tokens, embedding_size, name="encoder_embs")(encoder_inputs)

#encoder lstm
encoder = LSTM(UNITS, return_state=True, name="encoder_LSTM") #(encoder_embs)
encoder_outputs, state_h, state_c = encoder(encoder_embs)

encoder_states = [state_h, state_c]

decoder_inputs = Input(shape=(None,), name="decoder_inputs")
decoder_embs = Embedding(num_decoder_tokens, embedding_size, name="decoder_embs")(decoder_inputs)

#decoder lstm
decoder_lstm = LSTM(UNITS, return_sequences=True, return_state=True, name="decoder_LSTM")
decoder_outputs, _, _ = decoder_lstm(decoder_embs, initial_state=encoder_states)

attention=Attention(name="attention_layer")
attention_out=attention([encoder_outputs, decoder_outputs])

decoder_concatenate=Concatenate(axis=-1, name="concat_layer")([decoder_outputs, attention_out])
decoder_outputs = TimeDistributed(Dense(units=num_decoder_tokens, 
                                  activation='softmax', name="decoder_denseoutput"))(decoder_concatenate)

model=Model([encoder_inputs, decoder_inputs], decoder_outputs, name="s2s_model")
model.compile(optimizer='RMSprop', loss='categorical_crossentropy', metrics=['accuracy'])

model.summary()

Model compiling is fine, no problems whatsoever. The encoder and decoder input and output shapes are:

Encoder training input shape:  (4000, 21)
Decoder training input shape:  (4000, 12)
Decoder training target shape:  (4000, 12, 3106)
--
Encoder test input shape:  (385, 21)

This is the model.fit code:

model.fit([encoder_training_input, decoder_training_input], decoder_training_target,
      epochs=100,
      batch_size=32,
      validation_split=0.2,)

When I run the fit phase, I get this error from the Concatenate layer:

ValueError: Dimension 1 in both shapes must be equal, but are 12 and 32. 
Shapes are [32,12] and [32,32]. for '{{node s2s_model/concat_layer/concat}} = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32](s2s_model/decoder_LSTM/PartitionedCall:1,
s2s_model/attention_layer/MatMul_1, s2s_model/concat_layer/concat/axis)' with input shapes: [32,12,128], [32,32,128], [] and with computed input tensors: input[2] = <2>.

So, the first 32 are batch_size, 128 are output shape from decoder_outputs and attention_out, 12 is the number of tokens of decoder inputs. I can't understand how to solve this error, I can't change the number of input tokens I think, any suggestions for me?

AloneTogether
  • 25,814
  • 5
  • 20
  • 39

2 Answers2

1

Replace axis=-1 with axis=1 in the concatenation layer. The example in this documentation should clarify why.

Your problem resides in the inputs passed to the concatenation. You need to specify the right axis to concatenate two differently shaped matrices or tensors as they are called in Tensorflow. The shapes [32, 12, 128] and [32, 32, 128] differ in the second dimension referenced by passing 1 (because dimensions start from 0 upwards). This would result in a shape [32, (12+32), 128], increasing the elements in the second dimension.

When you specify axis as -1 (default value), your concatenation layer basically flattens the input before use, which in your case does not work due to the difference in dimensions.

Majitsima
  • 51
  • 7
  • Links sometimes get broken, It is better to leave extra explanations as well – Igna Oct 25 '21 at 12:57
  • Thank you for pointing that out, I'll sort it out. – Majitsima Oct 25 '21 at 13:00
  • Thank you for your answer, I tried, now I get this error ValueError: Shapes (32, 12, 3106) and (32, 44, 3106) are incompatible – Gianni Pinotti Oct 25 '21 at 13:05
  • Your labels are not one-hot-encoded I would believe because it looks like a standard NMT network, right? If so try changing the loss function to loss='sparse_categorical_crossentropy'. – Majitsima Oct 25 '21 at 13:32
  • Yes, it's a pretty standard NMT network, and labels are one-hot encoded. I tried with sparse categorical crossentropy loss, I also tried to add two `GlobalAveragePooling1D` layers like in this Keras example [link](https://keras.io/api/layers/attention_layers/attention/), I still get the same error... Bad luck for me today :( – Gianni Pinotti Oct 25 '21 at 15:08
  • What are your source and target languages? It seems a bit odd that you input 12 into the decoder and 21 into the encoder? The languages I used had a much tighter average word count per line, I usually just fix line length as the maximum input and pad the sequence with a set empty symbol. – Majitsima Oct 25 '21 at 17:34
  • Actually, I'm trying to implement a text style transfer s2s network, input sentences are "modern" Shakespeare, target sentences are "old" Shakespeare (the dataset is aligned, for every modern sentence there is the corresponding old sentence, dataset [here](https://github.com/cocoxu/Shakespeare/tree/master/data/align/plays/merged)). I'm a beginner, so I thought that s2s model could work for text style transfer... Any suggestion is more than welcome – Gianni Pinotti Oct 26 '21 at 07:51
  • 1
    Set the axis back to -1 and change the order in which you input decoder_output and encoder_output into the attention layer. – Majitsima Oct 26 '21 at 20:34
  • 1
    Reasoning: check [this](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Attention#call_args) out. The query shape `[batch_size, Tq, dim]` and the output shape `[batch_size, Tq, dim]` are aligned. The attention weights are hereby mapped properly to the following layers of the decoder. – Majitsima Oct 26 '21 at 20:51
1

Solved this thanks to @Majitsima. I swapped the inputs to the Attention layer, so instead of

attention=Attention(name="attention_layer")
attention_out=attention([encoder_outputs, decoder_outputs])

the input is

attention=Attention(name="attention_layer")
attention_out=attention([decoder_outputs, encoder_outputs])

with

decoder_concatenate=Concatenate(axis=-1, name="concat_layer")([decoder_outputs, attention_out])

Everything seems to work now, so thank you again @Majitsima, hope this can help!