Questions tagged [self-attention]

57 questions
1
vote
0 answers

Pytorch's Transformer decoder accuracy fluctuation

I have a sequence to sequence POS tagging model which uses Transformer decoder to generate target tokens. My implementation of Pytorch's Transformer decoder is as follows: in the initialization: self.decoder_layer =…
1
vote
2 answers

Visualizing ViT Attention maps after fine tuning on medical dataset

I have imported the Vit-b32 model and fine-tuned it to perform classification task on echo images. Now I want to visualize the attention maps so that I can know on which part of the image the model is focusing for doing the classification task. But…
1
vote
0 answers

PyTorch Forecasting - Temporal Fusion Transformer calculate_prediction_actual_by_variable() plots empty

referring to the tutorial (https://pytorch-forecasting.readthedocs.io/en/stable/tutorials/stallion.html) provided by Pytorch about their implementation of the Temporal Fusion Transformer, I'm trying to use their…
1
vote
0 answers

one head attention mechanism pytorch

I am trying to implement the attention mechanism using the CIFAR10 dataset. The idea is to implement the attention layer considering only one head. Therefore, I took as reference the multi-head implementation given…
Dew
  • 21
  • 3
1
vote
0 answers

Masked self-attention in tranformer's decoder

I'm writing my thesis about attention mechanisms. In the paragraph in which I explain the decoder of transformer I wrote this: The first sub-layer is called masked self-attention, in which the masking operation consists in preventing the decoder…
1
vote
0 answers

Diffrent node number in mini_batch

I am fairly new to graph neural networks and I am training a GNN model using self attention and I have a few questions. The question is my node count and node_num differs in each batch such that in the first batch I have: Batch(batch=[1181],…
林深时
  • 11
  • 2
1
vote
1 answer

Swin Transformer attention maps visualization

I am using a Swin Transformer for a hierarchical problem of multi calss multi label classification. I would like to visualize the self attention maps on my input image trying to extract them from the model, unfortunately I am not succeeding in this…
1
vote
1 answer

How to handle tensor multiplication with dimension None

For example I have 2 tensors A and B both with dimension (None, HWC), when I use tf.matmul(tf.transpose(A),B) The result dimension will be (HWC,HWC), this is correct but I want to keep the None dimension so it can be(None, HWC, HWC). Is there…
1
vote
0 answers

How to use multiple heads option in selfAttention class?

I am playing around with Self-attention model from trax library. when I set n_heads=1, everything works fine. But when I set n_heads=2, my code breaks. I use only input activations and one SelfAttention layer. Here is a minimal code: import…
Kenenbek Arzymatov
  • 8,439
  • 19
  • 58
  • 109
0
votes
0 answers

Error in PyTorch: mat1 and mat2 shapes cannot be multiplied

I'm working on a PyTorch project and I want to generate MNIST images using a U-Net architecture combined with a DDPM (Diffusion Models) approach. I'm encountering the following error: encountering the following error: File…
Zahra Hosseini
  • 478
  • 2
  • 4
  • 14
0
votes
0 answers

How to implement a global self attention with sparse tensor?

Using the following code, I am implementing a global self-attention for sparse input using Minkowski_Engine. I am getting a bit worse result than the model without attention and wonder why this happened. Typically since in the last line of the code…
mrghafari
  • 35
  • 6
0
votes
1 answer

How do I make keras run a Dense layer for each row of an input matrix?

I'm trying to build a basic transformer using keras Attention Layer. For this I need to have 3 different dense layer, each of which generates key,query and value matrices respectively, by running every word embedding through them. But there seems to…
user2741831
  • 2,120
  • 2
  • 22
  • 43
0
votes
0 answers

how to visualize cross-attention maps for checking text-image alignment well?

I was wondering how to visualize cross-attention map of image features a model is looking at given a text query (e.g. sentence). There are some amazing explainable tools ilke Class Activationi Maps, but they are almost needed 'class' or CNN model…
0
votes
0 answers

WQ, WK, WV matrix used for generating query, key and value vector for Attention in Transformers are fixed or WQ, WK and WV are dependent on input word

To calculate self-attention, For each word, we create a Query vector, a Key vector, and a Value vector. These vectors are created by multiplying the embedding by three matrices that we trained during the training process defined as WQ, WK, WV…
0
votes
1 answer

Store intermediate values of pytorch module

I try to plot attention maps for ViT. I know that I can do something like h_attn = model.blocks[-1].attn.register_forward_hook(get_activations('attention')) to register a hook that camputres output of some nn.module in my model. The ViT's attention…
Mitch
  • 27
  • 5