Highest Voted 'self-attention' Questions

0

votes

0 answers

the param of attention layer is 0

when I build multi_head_self_attention ,I found the param of this layer is 0,what is wrong with this attention layer？what should i do to modify this layer? I initialize query, key, value in init,and by attention function ,I can get the result of…

asked Sep 28 '22 at 17:23

123liang

1

0

votes

0 answers

Cannot implement attention-LSTM-attention model

I am new to using keras and want to create a model with the structure like input>>attention>>LSTM>>attention>>output But an error occurred when I ran model.fit, it gave the error of broadcastable shapes, but I don't understand why it can be created…

keras lstm self-attention

asked Sep 28 '22 at 07:14

Isaac

1

0

votes

1 answer

NLP: transformer learning weights

The softmax function obtains the weights and then MatMul with V. Are the weights stored anywhere? Or how the learning process happened if the weights are not stored or used on the next round? Moreover, the linear transformation does not use the…

nlp embedding transformer-model attention-model self-attention

asked Sep 24 '22 at 03:46

Jules Santis

1

0

votes

1 answer

Sparse - Dense MultiHead Attention in Tensorflow Keras

For an objective, I am trying to compute the MultiHead Attention Matrix for a sparse matrix and a dense matrix. I understand that by default, the Keras MultiHead Attention API requires two dense matrices, and then returns the attention value after…

python tensorflow keras attention-model self-attention

asked Jul 08 '22 at 17:17

Arka Mukherjee

2,083
1
13
27

0

votes

0 answers

I want to ask you about the structure of "query, key, value" of "transformer"

I'm a beginner at NLP. So I'm trying to reproduce the most basic transformer all you need code. But I got a question while doing it. In the MultiHeadAttention layer, I printed out the shape of "query, key, value". However, the different shapes of…

nlp pytorch translation transformer-model self-attention

asked Jan 18 '22 at 03:10

user16579274

0

votes

1 answer

ValueError: Shapes (None, 5) and (None, 15, 5) are incompatible

I want to implement a Hierarchical attention mechanism for document classification presented by Yang. But I want to replace LSTM with Transformer. I used Apoorv Nandan's text classification with…

python tensorflow keras self-attention

asked Dec 03 '21 at 10:50

Rahman

410
6
26

0

votes

1 answer

How can I change self attention layer numbers and multihead attention head numbers in my model with Pytorch?

I working on sarcasm dataset and my model like below: I first tokenize my input text: PRETRAINED_MODEL_NAME = "roberta-base" from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained(PRETRAINED_MODEL_NAME) import torch from…

pytorch sentiment-analysis huggingface-transformers bert-language-model self-attention

asked Nov 25 '21 at 14:28

mahdi rafiei

33
4

0

votes

1 answer

Question about tokens used in Transformer decoder attention layers during Inference

I was looking at the shapes used during decoder (both self-attention and enc-dec-attention blocks) and understand there is a difference in the way decoder runs during training versus during inference based on this link and the original Attention…

matrix-multiplication huggingface-transformers transformer-model attention-model self-attention

asked Nov 08 '21 at 17:25

Joe Black

625
6
19

0

votes

0 answers

Why do we need 'value', 'key', and 'query' in attention layer?

I am learning basic ideas about the 'Transformer' Model. Based on the paper and tutorial I saw, the 'Attention layer' uses the neural network to get the 'value', the 'key', and the 'query'. Here is the attention layer I learned from online. class…

deep-learning attention-model self-attention

asked Oct 11 '21 at 12:45

Xudong

441
5
16

-1

votes

0 answers

Calculating Sentence Level Attention

How do I quantify the attention between input and output sentences in a sequence-to-sequence language modelling scenario [translation or summarization]? For instance, consider these input and output statements, i.e., document is the input, and…

nlp huggingface-transformers attention-model self-attention multihead-attention

asked Aug 19 '23 at 14:21

Parteek Singh Jamwal

1

-2

votes

1 answer

What's the use of residual connections in neural networks?

I've recently been learning about self-attention transformers and the "Attention is All You Need" paper. When describing the architecture of the neural network used in the paper, one breakdown of the paper included this explanation for residual…

machine-learning neural-network transformer-model self-attention

asked Jun 14 '22 at 16:39

big_vibes

1

-2

votes

1 answer

Feeding an image to stacked resnet blocks to create an embedding

Do you have any code example or paper that refers to something like the following diagram? I want to know why we want to stack multiple resnet blocks as opposed to multiple convolutional block as in more traditional architectures? Any code sample…

pytorch computer-vision resnet attention-model self-attention

asked Aug 24 '21 at 04:10

Mona Jalal

34,860
64
239
408

Questions tagged [self-attention]