Keras how to add an attention layer for a weighted sum

Question

I have the following a network architecture (only the relevant part of the network is shown below)

vocab_dimension = 1500
embed_dimension = 10


x = [Input(shape=(None, ), name='input', dtype='int32'),
     Input(shape=(None, ), name='weights'),
     Input(shape=(None, ), name='neg_examples', dtype='int32')]


embedding_layer = Embedding(input_dim=vocab_dimension, output_dim=embed_dimension)


def _weighted_sum(x):
    return K.sum(x[0] * K.expand_dims(x[1], -1), axis=1, keepdims=True)


weighted_sum = Lambda(_weighted_sum, name='weighted_sum')

item_vecs = embedding_layer(x[2])
user_vecs = weighted_sum([embedding_layer(x[0]), x[1]])

The problem here is that I would like to not pass the weights as input, but I would like to 'learn' them, like in an attention layer.

I know that attention layers could be created this way

attention_probs = Dense(h, activation='softmax', name='attention_probs')(x[0])
weighted_sum = Lambda(_weighted_sum)([x[0], attention_probs])

h is equal to the dimension of the length of the input, which I set to 5. However, if I do the above I get the following error TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'

I think this has to do with the dimensions of the input parameters, but I'm not sure how to fix this.

You can't pass an input tensor with unknown shape to a Dense layer. — today, Nov 27 '18 at 11:12
Nor you can pass unknown shapes to a weighted sum. (You will end up with exactly the same problem) - Consider using `Input(shape=(someActualLength,))`. — Daniel Möller, Nov 27 '18 at 11:46

score 0 · Answer 1 · answered Nov 27 '18 at 12:51

If you fix h you'll have to fix the length of the input. You put it to be None but your dimensions are going to match only in case of h==input_size. It strange that you even can pass undefined size to Dense layer usually you'll get error like ValueError: The last dimension of the inputs to 'Dense' should be defined. Found 'None'.

Note that in the Input layer you do not have batch_size. So if your backend tensor should be of size (batch_size,dim1) you create input like Input((dim1,))

Keras how to add an attention layer for a weighted sum

1 Answers1