Questions tagged [attention-model]

Questions regarding attention model mechanism in deep learning

389 questions
4
votes
0 answers

Self-Attention Explainability of the Output Score Matrix

I am learning about attention models, and following along with Jay Alammar's amazing blog tutorial on The Illustrated Transformer. He gives a great walkthrough for how the attention scores are calculated, but I get a bit lost at a certain point, and…
Yu Chen
  • 6,540
  • 6
  • 51
  • 86
4
votes
0 answers

HAN ValueError: Unknown layer: AttentionWithContext in getting deep copy of model

I am fitting HAN model in my data. And Want to save all the models in each iteration. For this purpose I am making a list of model at each iteration. And getting following error while deep copying the model. ValueError: Unknown layer:…
Sadaf
  • 79
  • 2
4
votes
1 answer

Hierarchical Attention Network - model.fit generates error 'ValueError: Input dimension mis-match'

For background, I am referring to the Hierarchical Attention Network used for sentiment classification. For code: my full code is posted below, but it is just simple revision of the original code posted by the author on the link above. And I…
Ziqi
  • 2,445
  • 5
  • 38
  • 65
4
votes
1 answer

How can I pre-compute a mask for each input and adjust the weights according to this mask?

I want to provide a mask, the same size as the input image and adjust the weights learned from the image according to this mask (similar to attention, but pre-computed for each image input). How can I do this with Keras (or TensorFlow)?
dusa
  • 840
  • 3
  • 14
  • 31
4
votes
1 answer

why softmax get small gradient when the value is large in paper 'Attention is all you need'

This is the screen of the original paper: the screen of the paper. I understand the meaning of the paper is that when the value of dot-product is large, the gradient of softmax will get very small. However, I tried to calculate the gradient of…
4
votes
1 answer

Transformer - Attention is all you need - encoder decoder cross attention

It is my understanding that each encoder block takes the output from the previous encoder, and that the output is the attended representation (Z) of the sequence (aka sentence). My question is, how does the last encoder block produce K, V from Z…
4
votes
1 answer

AttentionDecoderRNN without MAX_LENGTH

From the PyTorch Seq2Seq tutorial, http://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html#attention-decoder We see that the attention mechanism is heavily reliant on the MAX_LENGTH parameter to determine the output dimensions of…
3
votes
0 answers

Segmentation fault (core dumped) with libpatches.so

Edit3: Loaded core into gdb. Edit2: Included the .cc code. Edit1: loaded debug symbols. I'm trying to run the example mnist program of the attention-sampling github library. The error out put is as…
3
votes
0 answers

Implementing 1D self attention in PyTorch

I'm trying to implement the 1D self-attention block below using PyTorch: proposed in the following paper. Below you can find my (provisional) attempt: import torch.nn as nn import torch #INPUT shape ((B), CH, H, W) class…
James Arten
  • 523
  • 5
  • 16
3
votes
0 answers

How to use nn.MultiheadAttention together with nn.LSTM?

I'm trying to build a Pytorch network for image captioning. Currently I have a working network of Encoder and Decoder, and I want to add nn.MultiheadAttnetion layer to it (to be used as self attention). Currently my decode looks like this: class…
3
votes
1 answer

Implementing custom learning rate scheduler in Pytorch?

I would like to implement this learning rate method as in the paper Attention is all you need. I have this code in Tensorflow, but I would like to implement it in Pytorch too. I know that Pytorch has modules for this…
Dametime
  • 581
  • 1
  • 6
  • 23
3
votes
1 answer

attn_output_weights in MultiheadAttention

I wanna know if the matrix of the attn_output_weight can demonstrate the relationship between every word-pair in the input sequence. In my project, I draw the heat map based on this output and it shows like this: However, I can hardly see any…
Yuki Wang
  • 85
  • 8
3
votes
1 answer

Retrieve the "relevant tokens" with a BERT model (already fine-tuned)

I already fine-tuned a BERT model ( with the huggingface library) for a classification task to predict a post category in two types (1 and 0, for example). But, I would need to retrieve the "relevant tokens" for the documents that are predicted as…
3
votes
1 answer

Number of learnable parameters of MultiheadAttention

While testing (using PyTorch's MultiheadAttention), I noticed that increasing or decreasing the number of heads of the multi-head attention does not change the total number of learnable parameters of my model. Is this behavior correct? And if so,…
Elidor00
  • 1,271
  • 13
  • 27
3
votes
1 answer

AttributeError: can't set attribute. Hierarchical Attentional Network

When I am defining the Hierarchical Attentional Network, an error is popping up which says "AttributeError: can't set attribute". Please help. This is the Attention.py file import keras import Attention from keras.engine.topology import Layer,…