Highest Voted 'attention-model' Questions

2

votes

0 answers

How to correctly load a deep learning model with a personalized layer?

I trained and tested a DL model that uses a different version of the Attention layer provided by keras because it needs to do more stuff in respect the basic one. My perfomance are about 90.4% of accuracy on test. But when i close all the colab and…

asked Mar 08 '23 at 09:23

daniele9845

21
1

2

votes

1 answer

Have I implemented self-attention correctly in Pytorch?

This is my attempt at implementing self-attention using PyTorch. Have I done anything wrong, or could it be improved somehow? class SelfAttention(nn.Module): def __init__(self, embedding_dim): super(SelfAttention, self).__init__() …

pytorch nlp attention-model self-attention

asked Dec 25 '22 at 12:32

Henry Gordon

21
2

2

votes

1 answer

Mismatch between computational complexity of Additive attention and RNN cell

According to Attention is all you need paper: Additive attention (The classic attention use in RNN by Bahdanau) computes the compatibility function using a feed-forward network with a single hidden layer. While the two are similar in theoretical…

machine-learning deep-learning nlp recurrent-neural-network attention-model

asked Dec 02 '22 at 14:21

Ferdinand Mom

59
1
5

2

votes

2 answers

Understanding dimensions in MultiHeadAttention layer of Tensorflow

I'm learning multi-head attention with this article. As the writer claimed, the structure of MHA (by the original paper) is as follows: But the MultiHeadAttention layer of Tensorflow seems to be more flexible: It does not require key_dim *…

tensorflow nlp transformer-model attention-model

asked Nov 14 '22 at 07:50

lovetl2002

910
9
23

2

votes

1 answer

how can we get the attention scores of multimodal models via hugging face library?

I was wondering if we could get the attention scores of any multimodal model using the api provided by the hugging face library, as it's relatively easier to get such scores of normal language bert model, but what about lxmert? If anyone can answer…

image-processing huggingface-transformers bert-language-model transformer-model attention-model

asked May 28 '22 at 09:51

lazytux

157
5

2

votes

1 answer

Pytorch MultiHeadAttention error with query sequence dimension different from key/value dimension

I am playing around with the pytorch implementation of MultiHeadAttention. In the docs it states that the query dimensions are [N,L,E] (assuming batch_first=True) where N is the batch dimension, L is the target sequence length and E is the embedding…

python pytorch attention-model

asked Mar 18 '22 at 12:05

BenedictWilkins

1,173
8
25

2

votes

0 answers

Do the multiple heads in Multi head attention actually lead to more parameters or different outputs?

I am trying to understand Transformers. While I understand the concept of the encoder-decoder structure and the idea behind self-attention what I am stuck at is the "multi head part" of the "MultiheadAttention-Layer". Looking at this explanation…

deep-learning pytorch huggingface-transformers attention-model

asked Mar 08 '22 at 19:01

Aushilfsgott

91
8

2

votes

1 answer

Keras MultiHeadAttention layer throwing IndexError: tuple index out of range

I'm getting this error over and over again when trying to do self attention on 1D vectors, I don't really understand why that happens, any help would be greatly appreciated. layer = layers.MultiHeadAttention(num_heads=2, key_dim=2) target =…

python tensorflow keras attention-model self-attention

asked Jan 25 '22 at 14:34

Fourat Thamri

73
6

2

votes

1 answer

Adding a simple attention layer to a custom resnet 18 architecture causes error in forward pass

I am adding the following code in resnet18 custom code self.layer1 = self._make_layer(block, 64, layers[0]) ## code existed before self.layer2 = self._make_layer(block, 128, layers[1], stride=2) ## code existed before self.layer_attend1 = …

python deep-learning pytorch computer-vision attention-model

asked Aug 31 '21 at 22:46

Mona Jalal

34,860
64
239
408

2

votes

1 answer

Attention layer to keras seq2seq model

I have seen the keras now comes with Attention Layer. However, I have some problem using it in my Seq2Seq model. This is the working seq2seq model without attention: latent_dim = 300 embedding_dim = 200 clear_session() # Encoder encoder_inputs =…

python tensorflow keras attention-model seq2seq

asked Jul 13 '21 at 14:30

BlueMango

463
7
21

2

votes

1 answer

tgt and src have to have equal features for a Transformer Network in Pytorch

I am attempting to train EEG data through a transformer network. The input dimensions are 50x16684x60 (seq x batch x features) and the output is 16684x2. Right now I am simply trying to run a basic transformer, and I keep getting an error telling…

machine-learning pytorch transformer-model attention-model

asked Apr 02 '21 at 15:55

Kartikeya Gullapalli

111
8

2

votes

1 answer

Implementing BiLSTM-Attention-CRF Model using Pytorch

I am trying to Implement the BiLSTM-Attention-CRF model for the NER task. I am able to perform NER tasks based on the BILSTM-CRF model (code from here) but I need to add attention to improve the performance of the model. Right now my model is…

python pytorch named-entity-recognition attention-model

asked Jan 31 '21 at 15:27

abhi8569

131
1
9

2

votes

1 answer

BigBird, or Sparse self-attention: How to implement a sparse matrix?

This question is related to the new paper: Big Bird: Transformers for Longer Sequences. Mainly, about the implementation of the Sparse Attention (that is specified in the Supplemental material, part D). Currently, I am trying to implement it in…

neural-network pytorch tensor bert-language-model attention-model

asked Dec 25 '20 at 17:22

Germans Savcisens

158
12

2

votes

1 answer

Multi Head Attention: Correct implementation of Linear Transformations of Q, K, V

I am implementing the Multi-Head Self-Attention in Pytorch now. I looked at a couple of implementations and they seem a bit wrong, or at least I am not sure why it is done the way it is. They would often apply the linear projection just once: …

neural-network nlp pytorch bert-language-model attention-model

asked Dec 17 '20 at 11:45

Germans Savcisens

158
12

2

votes

1 answer

Query padding mask and key padding mask in Transformer encoder

I'm implementing self-attention part in transformer encoder using pytorch nn.MultiheadAttention and confusing in the padding masking of transformer. The following picture shows the self-attention weight of the query (row) and key (column). As you…

python pytorch transformer-model attention-model

asked Dec 12 '20 at 08:19

Ian

99
11

Questions tagged [attention-model]