5

Is there a way to use the native tf Attention layer with keras Sequential API?

I'm looking to use this particular class. I have found custom implementations such as this one. What I'm truly looking for is the use of this particular class with the Sequential API

Here's a code example of what I'm looking for

model = tf.keras.models.Sequential()

model.add(tf.keras.layers.Embedding(vocab_length,
                          EMBEDDING_DIM, input_length=MAX_SEQUENCE_LENGTH,
                          weights=[embedding_matrix], trainable=False))

model.add(tf.keras.layers.Dropout(0.3))

model.add(tf.keras.layers.Conv1D(64, 5, activation='relu'))
model.add(tf.keras.layers.MaxPooling1D(pool_size=4))

model.add(tf.keras.layers.CuDNNLSTM(100))
model.add(tf.keras.layers.Dropout(0.4))

model.add(tf.keras.layers.Attention()) # Doesn't work this way

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))
alift
  • 1,855
  • 2
  • 13
  • 28
Wajd Meskini
  • 94
  • 1
  • 6
  • 2
    I implemented an attention layer for Keras (found [here](https://github.com/thushv89/attention_keras)). I've only tested this for seq2seq models. But should be able to get it working for your model (probably with some minimal changes). – thushv89 Dec 12 '19 at 22:15
  • Thank you thrushv89. But I'm not quite sure your link uses the Sequential API, I think it uses the Functional API, but I'm a bit of a beginner so I might be wrong – Wajd Meskini Dec 13 '19 at 06:53
  • 1
    You are correct to a certain extent. I am not using the Sequential API. So if this is what you're after probably my code won't help. I am using the SubClassing API to create the layer. I am using the functional API to demonstrate the model with the AttentionLayer. – thushv89 Dec 13 '19 at 10:01

1 Answers1

0

I ended up using a custom class I found on this repository by tsterbak. It's the AttentionWeightedAverage class. It is compatible with the Sequential API Here's my model for reference :

model = Sequential()

model.add(Embedding(input_dim=vocab_length,
                    output_dim=EMBEDDING_DIM, input_length=MAX_SEQUENCE_LENGTH,
                    weights=[embedding_matrix], trainable=False))
model.add(Conv1D(64, 5, activation='relu'))
model.add(MaxPooling1D(pool_size=4))

model.add(Bidirectional(GRU(100, return_sequences=True)))

model.add(AttentionWeightedAverage())
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer="adam", metrics=['accuracy'])

Note that it's what is called "soft attention" or "attention with weighted average", as described in "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention". The details are more understandable here

Wajd Meskini
  • 94
  • 1
  • 6