Questions regarding attention model mechanism in deep learning
Questions tagged [attention-model]
389 questions
2
votes
0 answers
How to correctly load a deep learning model with a personalized layer?
I trained and tested a DL model that uses a different version of the Attention layer provided by keras because it needs to do more stuff in respect the basic one. My perfomance are about 90.4% of accuracy on test. But when i close all the colab and…

daniele9845
- 21
- 1
2
votes
1 answer
Have I implemented self-attention correctly in Pytorch?
This is my attempt at implementing self-attention using PyTorch. Have I done anything wrong, or could it be improved somehow?
class SelfAttention(nn.Module):
def __init__(self, embedding_dim):
super(SelfAttention, self).__init__()
…

Henry Gordon
- 21
- 2
2
votes
1 answer
Mismatch between computational complexity of Additive attention and RNN cell
According to Attention is all you need paper: Additive attention (The classic attention use in RNN by Bahdanau) computes the compatibility function using a feed-forward network with a single hidden layer. While the two are similar in theoretical…

Ferdinand Mom
- 59
- 1
- 5
2
votes
2 answers
Understanding dimensions in MultiHeadAttention layer of Tensorflow
I'm learning multi-head attention with this article.
As the writer claimed, the structure of MHA (by the original paper) is as follows:
But the MultiHeadAttention layer of Tensorflow seems to be more flexible:
It does not require key_dim *…

lovetl2002
- 910
- 9
- 23
2
votes
1 answer
how can we get the attention scores of multimodal models via hugging face library?
I was wondering if we could get the attention scores of any multimodal model using the api provided by the hugging face library, as it's relatively easier to get such scores of normal language bert model, but what about lxmert? If anyone can answer…

lazytux
- 157
- 5
2
votes
1 answer
Pytorch MultiHeadAttention error with query sequence dimension different from key/value dimension
I am playing around with the pytorch implementation of MultiHeadAttention.
In the docs it states that the query dimensions are [N,L,E] (assuming batch_first=True) where N is the batch dimension, L is the target sequence length and E is the embedding…

BenedictWilkins
- 1,173
- 8
- 25
2
votes
0 answers
Do the multiple heads in Multi head attention actually lead to more parameters or different outputs?
I am trying to understand Transformers. While I understand the concept of the encoder-decoder structure and the idea behind self-attention what I am stuck at is the "multi head part" of the "MultiheadAttention-Layer".
Looking at this explanation…

Aushilfsgott
- 91
- 8
2
votes
1 answer
Keras MultiHeadAttention layer throwing IndexError: tuple index out of range
I'm getting this error over and over again when trying to do self attention on 1D vectors, I don't really understand why that happens, any help would be greatly appreciated.
layer = layers.MultiHeadAttention(num_heads=2, key_dim=2)
target =…

Fourat Thamri
- 73
- 6
2
votes
1 answer
Adding a simple attention layer to a custom resnet 18 architecture causes error in forward pass
I am adding the following code in resnet18 custom code
self.layer1 = self._make_layer(block, 64, layers[0]) ## code existed before
self.layer2 = self._make_layer(block, 128, layers[1], stride=2) ## code existed before
self.layer_attend1 = …

Mona Jalal
- 34,860
- 64
- 239
- 408
2
votes
1 answer
Attention layer to keras seq2seq model
I have seen the keras now comes with Attention Layer. However, I have some problem using it in my Seq2Seq model.
This is the working seq2seq model without attention:
latent_dim = 300
embedding_dim = 200
clear_session()
# Encoder
encoder_inputs =…

BlueMango
- 463
- 7
- 21
2
votes
1 answer
tgt and src have to have equal features for a Transformer Network in Pytorch
I am attempting to train EEG data through a transformer network. The input dimensions are 50x16684x60 (seq x batch x features) and the output is 16684x2. Right now I am simply trying to run a basic transformer, and I keep getting an error telling…

Kartikeya Gullapalli
- 111
- 8
2
votes
1 answer
Implementing BiLSTM-Attention-CRF Model using Pytorch
I am trying to Implement the BiLSTM-Attention-CRF model for the NER task. I am able to perform NER tasks based on the BILSTM-CRF model (code from here) but I need to add attention to improve the performance of the model.
Right now my model is…

abhi8569
- 131
- 1
- 9
2
votes
1 answer
BigBird, or Sparse self-attention: How to implement a sparse matrix?
This question is related to the new paper: Big Bird: Transformers for Longer Sequences. Mainly, about the implementation of the Sparse Attention (that is specified in the Supplemental material, part D). Currently, I am trying to implement it in…

Germans Savcisens
- 158
- 12
2
votes
1 answer
Multi Head Attention: Correct implementation of Linear Transformations of Q, K, V
I am implementing the Multi-Head Self-Attention in Pytorch now. I looked at a couple of implementations and they seem a bit wrong, or at least I am not sure why it is done the way it is. They would often apply the linear projection just once:
…

Germans Savcisens
- 158
- 12
2
votes
1 answer
Query padding mask and key padding mask in Transformer encoder
I'm implementing self-attention part in transformer encoder using pytorch nn.MultiheadAttention and confusing in the padding masking of transformer.
The following picture shows the self-attention weight of the query (row) and key (column).
As you…

Ian
- 99
- 11