Questions tagged [attention-model]

Questions regarding attention model mechanism in deep learning

389 questions
2
votes
0 answers

How to correctly load a deep learning model with a personalized layer?

I trained and tested a DL model that uses a different version of the Attention layer provided by keras because it needs to do more stuff in respect the basic one. My perfomance are about 90.4% of accuracy on test. But when i close all the colab and…
2
votes
1 answer

Have I implemented self-attention correctly in Pytorch?

This is my attempt at implementing self-attention using PyTorch. Have I done anything wrong, or could it be improved somehow? class SelfAttention(nn.Module): def __init__(self, embedding_dim): super(SelfAttention, self).__init__() …
2
votes
1 answer

Mismatch between computational complexity of Additive attention and RNN cell

According to Attention is all you need paper: Additive attention (The classic attention use in RNN by Bahdanau) computes the compatibility function using a feed-forward network with a single hidden layer. While the two are similar in theoretical…
2
votes
2 answers

Understanding dimensions in MultiHeadAttention layer of Tensorflow

I'm learning multi-head attention with this article. As the writer claimed, the structure of MHA (by the original paper) is as follows: But the MultiHeadAttention layer of Tensorflow seems to be more flexible: It does not require key_dim *…
lovetl2002
  • 910
  • 9
  • 23
2
votes
1 answer

how can we get the attention scores of multimodal models via hugging face library?

I was wondering if we could get the attention scores of any multimodal model using the api provided by the hugging face library, as it's relatively easier to get such scores of normal language bert model, but what about lxmert? If anyone can answer…
2
votes
1 answer

Pytorch MultiHeadAttention error with query sequence dimension different from key/value dimension

I am playing around with the pytorch implementation of MultiHeadAttention. In the docs it states that the query dimensions are [N,L,E] (assuming batch_first=True) where N is the batch dimension, L is the target sequence length and E is the embedding…
BenedictWilkins
  • 1,173
  • 8
  • 25
2
votes
0 answers

Do the multiple heads in Multi head attention actually lead to more parameters or different outputs?

I am trying to understand Transformers. While I understand the concept of the encoder-decoder structure and the idea behind self-attention what I am stuck at is the "multi head part" of the "MultiheadAttention-Layer". Looking at this explanation…
2
votes
1 answer

Keras MultiHeadAttention layer throwing IndexError: tuple index out of range

I'm getting this error over and over again when trying to do self attention on 1D vectors, I don't really understand why that happens, any help would be greatly appreciated. layer = layers.MultiHeadAttention(num_heads=2, key_dim=2) target =…
2
votes
1 answer

Adding a simple attention layer to a custom resnet 18 architecture causes error in forward pass

I am adding the following code in resnet18 custom code self.layer1 = self._make_layer(block, 64, layers[0]) ## code existed before self.layer2 = self._make_layer(block, 128, layers[1], stride=2) ## code existed before self.layer_attend1 = …
Mona Jalal
  • 34,860
  • 64
  • 239
  • 408
2
votes
1 answer

Attention layer to keras seq2seq model

I have seen the keras now comes with Attention Layer. However, I have some problem using it in my Seq2Seq model. This is the working seq2seq model without attention: latent_dim = 300 embedding_dim = 200 clear_session() # Encoder encoder_inputs =…
BlueMango
  • 463
  • 7
  • 21
2
votes
1 answer

tgt and src have to have equal features for a Transformer Network in Pytorch

I am attempting to train EEG data through a transformer network. The input dimensions are 50x16684x60 (seq x batch x features) and the output is 16684x2. Right now I am simply trying to run a basic transformer, and I keep getting an error telling…
2
votes
1 answer

Implementing BiLSTM-Attention-CRF Model using Pytorch

I am trying to Implement the BiLSTM-Attention-CRF model for the NER task. I am able to perform NER tasks based on the BILSTM-CRF model (code from here) but I need to add attention to improve the performance of the model. Right now my model is…
2
votes
1 answer

BigBird, or Sparse self-attention: How to implement a sparse matrix?

This question is related to the new paper: Big Bird: Transformers for Longer Sequences. Mainly, about the implementation of the Sparse Attention (that is specified in the Supplemental material, part D). Currently, I am trying to implement it in…
2
votes
1 answer

Multi Head Attention: Correct implementation of Linear Transformations of Q, K, V

I am implementing the Multi-Head Self-Attention in Pytorch now. I looked at a couple of implementations and they seem a bit wrong, or at least I am not sure why it is done the way it is. They would often apply the linear projection just once: …
2
votes
1 answer

Query padding mask and key padding mask in Transformer encoder

I'm implementing self-attention part in transformer encoder using pytorch nn.MultiheadAttention and confusing in the padding masking of transformer. The following picture shows the self-attention weight of the query (row) and key (column). As you…
Ian
  • 99
  • 11