Questions tagged [self-attention]

57 questions
0
votes
0 answers

the param of attention layer is 0

when I build multi_head_self_attention ,I found the param of this layer is 0,what is wrong with this attention layer?what should i do to modify this layer? I initialize query, key, value in init,and by attention function ,I can get the result of…
0
votes
0 answers

Cannot implement attention-LSTM-attention model

I am new to using keras and want to create a model with the structure like input>>attention>>LSTM>>attention>>output But an error occurred when I ran model.fit, it gave the error of broadcastable shapes, but I don't understand why it can be created…
Isaac
  • 1
0
votes
1 answer

NLP: transformer learning weights

The softmax function obtains the weights and then MatMul with V. Are the weights stored anywhere? Or how the learning process happened if the weights are not stored or used on the next round? Moreover, the linear transformation does not use the…
0
votes
1 answer

Sparse - Dense MultiHead Attention in Tensorflow Keras

For an objective, I am trying to compute the MultiHead Attention Matrix for a sparse matrix and a dense matrix. I understand that by default, the Keras MultiHead Attention API requires two dense matrices, and then returns the attention value after…
Arka Mukherjee
  • 2,083
  • 1
  • 13
  • 27
0
votes
0 answers

I want to ask you about the structure of "query, key, value" of "transformer"

I'm a beginner at NLP. So I'm trying to reproduce the most basic transformer all you need code. But I got a question while doing it. In the MultiHeadAttention layer, I printed out the shape of "query, key, value". However, the different shapes of…
user16579274
0
votes
1 answer

ValueError: Shapes (None, 5) and (None, 15, 5) are incompatible

I want to implement a Hierarchical attention mechanism for document classification presented by Yang. But I want to replace LSTM with Transformer. I used Apoorv Nandan's text classification with…
Rahman
  • 410
  • 6
  • 26
0
votes
1 answer

How can I change self attention layer numbers and multihead attention head numbers in my model with Pytorch?

I working on sarcasm dataset and my model like below: I first tokenize my input text: PRETRAINED_MODEL_NAME = "roberta-base" from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained(PRETRAINED_MODEL_NAME) import torch from…
0
votes
1 answer

Question about tokens used in Transformer decoder attention layers during Inference

I was looking at the shapes used during decoder (both self-attention and enc-dec-attention blocks) and understand there is a difference in the way decoder runs during training versus during inference based on this link and the original Attention…
0
votes
0 answers

Why do we need 'value', 'key', and 'query' in attention layer?

I am learning basic ideas about the 'Transformer' Model. Based on the paper and tutorial I saw, the 'Attention layer' uses the neural network to get the 'value', the 'key', and the 'query'. Here is the attention layer I learned from online. class…
Xudong
  • 441
  • 5
  • 16
-1
votes
0 answers

Calculating Sentence Level Attention

How do I quantify the attention between input and output sentences in a sequence-to-sequence language modelling scenario [translation or summarization]? For instance, consider these input and output statements, i.e., document is the input, and…
-2
votes
1 answer

What's the use of residual connections in neural networks?

I've recently been learning about self-attention transformers and the "Attention is All You Need" paper. When describing the architecture of the neural network used in the paper, one breakdown of the paper included this explanation for residual…
-2
votes
1 answer

Feeding an image to stacked resnet blocks to create an embedding

Do you have any code example or paper that refers to something like the following diagram? I want to know why we want to stack multiple resnet blocks as opposed to multiple convolutional block as in more traditional architectures? Any code sample…
Mona Jalal
  • 34,860
  • 64
  • 239
  • 408
1 2 3
4