How do attention network works?

Question

Recently I was going through Attention is all you need paper, ongoing through it I found an issue regarding understanding the attention network if I ignore the maths behind it. Can anyone make me understand the attention network with an example?

score 3 · Answer 1 · answered Mar 31 '20 at 18:35

This tutorial illustrates each core component in Transformer and definitely worth reading.

Intuitively, the attention mechanisms are trying to find the "similar" timestep according to an attention function (e.g. projection + cosine similarity in Attention is all you need), then compute the new representation with the accordingly calculated weight and previous representations.

How do attention network works?

1 Answers1