I'm a beginner at NLP. So I'm trying to reproduce the most basic transformer all you need code.
But I got a question while doing it.
In the MultiHeadAttention layer, I printed out the shape of "query, key, value". However, the different shapes of "query" and "key, value" were printed. "self-attention" eventually finds a correlation with oneself, which is different".I don't understand the shape of "query, key, value".
enter image description here The value of "query, key, value" comes from src, but why are the values different? enter image description here
I brought the code from here.