Questions tagged [attention-model]

Questions regarding attention model mechanism in deep learning

389 questions
9
votes
2 answers

Outputting attention for bert-base-uncased with huggingface/transformers (torch)

I was following a paper on BERT-based lexical substitution (specifically trying to implement equation (2) - if someone has already implemented the whole paper that would also be great). Thus, I wanted to obtain both the last hidden layers (only…
8
votes
1 answer

MultiHeadAttention attention_mask [Keras, Tensorflow] example

I am struggling to mask my input for the MultiHeadAttention Layer. I am using the Transformer Block from Keras documentation with self-attention. I could not find any example code online so far and would appreciate if someone could give me a code…
8
votes
1 answer

Different `grad_fn` for similar looking operations in Pytorch (1.0)

I am working on an attention model, and before running the final model, I was going through the tensor shapes which flow through the code. I have an operation where I need to reshape the tensor. The tensor is of the shape torch.Size([[30, 8, 9,…
abkds
  • 1,764
  • 7
  • 27
  • 43
7
votes
1 answer

Inputs to the nn.MultiheadAttention?

I have n-vectors which need to be influenced by each other and output n vectors with same dimensionality d. I believe this is what torch.nn.MultiheadAttention does. But the forward function expects query, key and value as inputs. According to this…
angryweasel
  • 316
  • 2
  • 10
7
votes
2 answers

Sequence to Sequence - for time series prediction

I've tried to build a sequence to sequence model to predict a sensor signal over time based on its first few inputs (see figure below) The model works OK, but I want to 'spice things up' and try to add an attention layer between the two LSTM…
7
votes
0 answers

Implementing attention in Keras classification

I would like to implement attention to a trained image classification CNN model. For example, there are 30 classes and with the Keras CNN, I obtain for each image the predicted class. However, to visualize the important features/locations of the…
7
votes
2 answers

How to visualize attention weights?

Using this implementation I have included attention to my RNN (which classify the input sequences into two classes) as follows. visible = Input(shape=(250,)) embed=Embedding(vocab_size,100)(visible) activations= keras.layers.GRU(250,…
6
votes
1 answer

Keras, model trains successfully but generating predictions gives ValueError: Graph disconnected: cannot obtain value for tensor KerasTensor

I created a Seq2Seq model for text summarization. I have two models, one with attention and one without. The one without attention was able to generate predictions but I can't do it for the one with attention even though it fits successfully. This…
BlueMango
  • 463
  • 7
  • 21
6
votes
0 answers

Getting error while converting a code in tf1 to tf2

Where the values are rnn_size: 512 batch_size: 128 rnn_inputs: Tensor("embedding_lookup/Identity_1:0", shape=(?, ?, 128), dtype=float32) sequence_length: Tensor("inputs_length:0", shape=(?,), dtype=int32) cell_fw:…
Args
  • 73
  • 5
6
votes
0 answers

Where should we put attention in an autoencoder?

In this tutorial in tensorflow site we can see a code for the implementation of an autoencoder which it's Decoder is as follows: class Decoder(tf.keras.Model): def __init__(self, vocab_size, embedding_dim, dec_units, batch_sz): super(Decoder,…
Marzi Heidari
  • 2,660
  • 4
  • 25
  • 57
6
votes
2 answers

How can I add tf.keras.layers.AdditiveAttention in my model?

I am working on a machine language translation problem. The Model I am using is: Model = Sequential([ Embedding(english_vocab_size, 256, input_length=english_max_len, mask_zero=True), LSTM(256, activation='relu'), …
user14349917
6
votes
1 answer

Implementing Luong Attention in PyTorch

I am trying to implement the attention described in Luong et al. 2015 in PyTorch myself, but I couldn't get it work. Below is my code, I am only interested in the "general" attention case for now. I wonder if I am missing any obvious error. It runs,…
zyxue
  • 7,904
  • 5
  • 48
  • 74
6
votes
1 answer

How to add attention layer to seq2seq model on Keras

Based on this article, I wrote this…
Osm
  • 81
  • 4
6
votes
0 answers

Attention in Tensorflow (tf.contrib.rnn.AttentionCellWrapper)

How exactly is tf.contrib.rnn.AttentionCellWrapper used? Can someone give a piece of example code? Specifically, I only managed to make the following fwd_cell =…
user3373273
  • 61
  • 1
  • 3
6
votes
0 answers

How to load a matrix to change the attention layer in seqToseq demo? - Paddle

While attempting to replicate the section 3.1 in Incorporating Discrete Translation Lexicons into Neural MT in paddle-paddle I tried to have a static matrix that I'll need to load into the seqToseq training pipeline, e.g.: >>> import numpy as np >>>…
alvas
  • 115,346
  • 109
  • 446
  • 738
1
2
3
25 26