Questions regarding attention model mechanism in deep learning
Questions tagged [attention-model]
389 questions
9
votes
2 answers
Outputting attention for bert-base-uncased with huggingface/transformers (torch)
I was following a paper on BERT-based lexical substitution (specifically trying to implement equation (2) - if someone has already implemented the whole paper that would also be great). Thus, I wanted to obtain both the last hidden layers (only…

Björn
- 644
- 10
- 23
8
votes
1 answer
MultiHeadAttention attention_mask [Keras, Tensorflow] example
I am struggling to mask my input for the MultiHeadAttention Layer. I am using the Transformer Block from Keras documentation with self-attention. I could not find any example code online so far and would appreciate if someone could give me a code…

R. Giskard
- 91
- 1
- 5
8
votes
1 answer
Different `grad_fn` for similar looking operations in Pytorch (1.0)
I am working on an attention model, and before running the final model, I was going through the tensor shapes which flow through the code. I have an operation where I need to reshape the tensor. The tensor is of the shape torch.Size([[30, 8, 9,…

abkds
- 1,764
- 7
- 27
- 43
7
votes
1 answer
Inputs to the nn.MultiheadAttention?
I have n-vectors which need to be influenced by each other and output n vectors with same dimensionality d. I believe this is what torch.nn.MultiheadAttention does. But the forward function expects query, key and value as inputs. According to this…

angryweasel
- 316
- 2
- 10
7
votes
2 answers
Sequence to Sequence - for time series prediction
I've tried to build a sequence to sequence model to predict a sensor signal over time based on its first few inputs (see figure below)
The model works OK, but I want to 'spice things up' and try to add an attention layer between the two LSTM…

Roni Gadot
- 437
- 2
- 19
- 30
7
votes
0 answers
Implementing attention in Keras classification
I would like to implement attention to a trained image classification CNN model. For example, there are 30 classes and with the Keras CNN, I obtain for each image the predicted class. However, to visualize the important features/locations of the…

TheJokerAEZ
- 361
- 1
- 3
- 9
7
votes
2 answers
How to visualize attention weights?
Using this implementation
I have included attention to my RNN (which classify the input sequences into two classes) as follows.
visible = Input(shape=(250,))
embed=Embedding(vocab_size,100)(visible)
activations= keras.layers.GRU(250,…

Stupid420
- 1,347
- 3
- 19
- 44
6
votes
1 answer
Keras, model trains successfully but generating predictions gives ValueError: Graph disconnected: cannot obtain value for tensor KerasTensor
I created a Seq2Seq model for text summarization. I have two models, one with attention and one without. The one without attention was able to generate predictions but I can't do it for the one with attention even though it fits successfully.
This…

BlueMango
- 463
- 7
- 21
6
votes
0 answers
Getting error while converting a code in tf1 to tf2
Where the values are
rnn_size: 512
batch_size: 128
rnn_inputs: Tensor("embedding_lookup/Identity_1:0", shape=(?, ?, 128), dtype=float32)
sequence_length: Tensor("inputs_length:0", shape=(?,), dtype=int32)
cell_fw:…

Args
- 73
- 5
6
votes
0 answers
Where should we put attention in an autoencoder?
In this tutorial in tensorflow site we can see a code for the implementation of an autoencoder which it's Decoder is as follows:
class Decoder(tf.keras.Model):
def __init__(self, vocab_size, embedding_dim, dec_units, batch_sz):
super(Decoder,…

Marzi Heidari
- 2,660
- 4
- 25
- 57
6
votes
2 answers
How can I add tf.keras.layers.AdditiveAttention in my model?
I am working on a machine language translation problem. The Model I am using is:
Model = Sequential([
Embedding(english_vocab_size, 256, input_length=english_max_len, mask_zero=True),
LSTM(256, activation='relu'),
…
user14349917
6
votes
1 answer
Implementing Luong Attention in PyTorch
I am trying to implement the attention described in Luong et al. 2015 in PyTorch myself, but I couldn't get it work. Below is my code, I am only interested in the "general" attention case for now. I wonder if I am missing any obvious error. It runs,…

zyxue
- 7,904
- 5
- 48
- 74
6
votes
1 answer
6
votes
0 answers
Attention in Tensorflow (tf.contrib.rnn.AttentionCellWrapper)
How exactly is tf.contrib.rnn.AttentionCellWrapper used? Can someone give a piece of example code?
Specifically, I only managed to make the following
fwd_cell =…

user3373273
- 61
- 1
- 3
6
votes
0 answers
How to load a matrix to change the attention layer in seqToseq demo? - Paddle
While attempting to replicate the section 3.1 in Incorporating Discrete Translation Lexicons into Neural MT in paddle-paddle
I tried to have a static matrix that I'll need to load into the seqToseq training pipeline, e.g.:
>>> import numpy as np
>>>…

alvas
- 115,346
- 109
- 446
- 738