3

I want to implement a simple attention mechanism to ensemble the results of a CNN model.

Concretely, each example of my input is a sequences of images, so each example has shape [None, img_width, img_height, n_channels].

Using a TimeDistributed wrapper, I can apply my CNN so that I get an output of shape [None, hidden_state_size].

I want to apply a CNN to each image in the sequence, and then compute an attention vector of shape [None]. In order to do this I run the output of the TimeDistributed CNN through a TimeDistributed Dense net with a single output unit, and compute the softmax over the Sequence.

The attention vector should then be multiplied by the output of the TimeDistributed CNN and everything should be summed, so that we end up with a tensor of shape [hidden_state_size].

The resulting code is this:

import tensorflow.keras as keras
import tensorflow.keras.layers as ll

inputs = ll.Input([None, 28, 28, 3])
x = inputs
x = ll.TimeDistributed(ll.Flatten())(x)
attention = ll.TimeDistributed(ll.Dense(1))(x)
attention = ll.Flatten()(attention)
attention = ll.Softmax()(attention)
outputs = ll.dot([x, attention], axes=[-2, -1])

model = keras.models.Model(inputs, outputs)

The dimensions of this model seem to check out, but will this do what I want? Or have I made a mistake somewhere?

Jsevillamol
  • 2,425
  • 2
  • 23
  • 46

0 Answers0