Questions tagged [self-attention]
57 questions
0
votes
0 answers
the param of attention layer is 0
when I build multi_head_self_attention ,I found the param of this layer is 0,what is wrong with this attention layer?what should i do to modify this layer?
I initialize query, key, value in init,and by attention function ,I can get the result of…

123liang
- 1
0
votes
0 answers
Cannot implement attention-LSTM-attention model
I am new to using keras and want to create a model with the structure like input>>attention>>LSTM>>attention>>output
But an error occurred when I ran model.fit, it gave the error of broadcastable shapes, but I don't understand why it can be created…

Isaac
- 1
0
votes
1 answer
NLP: transformer learning weights
The softmax function obtains the weights and then MatMul with V.
Are the weights stored anywhere? Or how the learning process happened if the weights are not stored or used on the next round?
Moreover, the linear transformation does not use the…
0
votes
1 answer
Sparse - Dense MultiHead Attention in Tensorflow Keras
For an objective, I am trying to compute the MultiHead Attention Matrix for a sparse matrix and a dense matrix. I understand that by default, the Keras MultiHead Attention API requires two dense matrices, and then returns the attention value after…

Arka Mukherjee
- 2,083
- 1
- 13
- 27
0
votes
0 answers
I want to ask you about the structure of "query, key, value" of "transformer"
I'm a beginner at NLP.
So I'm trying to reproduce the most basic transformer all you need code.
But I got a question while doing it.
In the MultiHeadAttention layer,
I printed out the shape of "query, key, value".
However, the different shapes of…
user16579274
0
votes
1 answer
ValueError: Shapes (None, 5) and (None, 15, 5) are incompatible
I want to implement a Hierarchical attention mechanism for document classification presented by Yang. But I want to replace LSTM with Transformer.
I used Apoorv Nandan's text classification with…

Rahman
- 410
- 6
- 26
0
votes
1 answer
How can I change self attention layer numbers and multihead attention head numbers in my model with Pytorch?
I working on sarcasm dataset and my model like below:
I first tokenize my input text:
PRETRAINED_MODEL_NAME = "roberta-base"
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(PRETRAINED_MODEL_NAME)
import torch
from…

mahdi rafiei
- 33
- 4
0
votes
1 answer
Question about tokens used in Transformer decoder attention layers during Inference
I was looking at the shapes used during decoder (both self-attention and enc-dec-attention blocks) and understand there is a difference in the way decoder runs during training versus during inference based on this link and the original Attention…

Joe Black
- 625
- 6
- 19
0
votes
0 answers
Why do we need 'value', 'key', and 'query' in attention layer?
I am learning basic ideas about the 'Transformer' Model. Based on the paper and tutorial I saw, the 'Attention layer' uses the neural network to get the 'value', the 'key', and the 'query'.
Here is the attention layer I learned from online.
class…

Xudong
- 441
- 5
- 16
-1
votes
0 answers
Calculating Sentence Level Attention
How do I quantify the attention between input and output sentences in a sequence-to-sequence language modelling scenario [translation or summarization]?
For instance, consider these input and output statements, i.e., document is the input, and…
-2
votes
1 answer
What's the use of residual connections in neural networks?
I've recently been learning about self-attention transformers and the "Attention is All You Need" paper. When describing the architecture of the neural network used in the paper, one breakdown of the paper included this explanation for residual…
-2
votes
1 answer
Feeding an image to stacked resnet blocks to create an embedding
Do you have any code example or paper that refers to something like the following diagram?
I want to know why we want to stack multiple resnet blocks as opposed to multiple convolutional block as in more traditional architectures? Any code sample…

Mona Jalal
- 34,860
- 64
- 239
- 408