0

In Timeseries classification with a Transformer model, the author builds a transformer encoder like this

def transformer_encoder(inputs, head_size, num_heads, ff_dim, dropout=0):
    x = layers.LayerNormalization(epsilon=1e-6)(inputs)
    x = layers.MultiHeadAttention(key_dim=head_size, num_heads=num_heads, dropout=dropout
    )(x, x)
 ...

And later sets key_dim=256 even though the input shape is [None,500,1]

Why can the input sequence length be 500, but the key dimension be 256? It seems that they should be the same and that the input sequence is the key sequence.

Progman
  • 16,827
  • 6
  • 33
  • 48
aez
  • 2,406
  • 2
  • 26
  • 46

0 Answers0