Questions regarding attention model mechanism in deep learning
Questions tagged [attention-model]
389 questions
2
votes
1 answer
Some parameters are not getting saved when saving a model in pytorch
I have built an encoder-decoder model with attention for morph inflection generation. I am able to train the model and predict on test data but I am getting wrong predicting after loading a saved model
I am not getting any error during saving or…

Umang Jain
- 21
- 5
2
votes
1 answer
Attention layer output shape issue
I have been using BiLSTMs to classify each word in sentences and my input is n_sentences, max_sequence_length, classes. Recently, I have been trying to use this attention layer:…

D. Clem
- 85
- 1
- 6
2
votes
0 answers
Graph disconnect in inference in Keras RNN + Encoder/Decoder + Attention
I've successfully trained a model in Keras using an encoder/decoder structure + attention + glove following several examples, most notably this one and this one. It's based on a modification of machine translation. This is a chatbot, so the input is…

a1orona
- 21
- 2
2
votes
2 answers
Is attention mechanism really attention or just looking back at memory again?
When reading attention mechanism, I am confusing about the term attention. Is it the same with our attention nature as described in it usual definition?

Giang Nguyen
- 450
- 8
- 17
2
votes
2 answers
What is used to train a self-attention mechanism?
I've been trying to understand self-attention, but everything I found doesn't explain the concept on a high level very well.
Let's say we use self-attention in a NLP task, so our input is a sentence.
Then self-attention can be used to measure how…
2
votes
4 answers
Keras: How to display attention weights in LSTM model
I made a text classification model using an LSTM with attention layer. I did my model well, it works well, but I can't display the attention weights and the importance/attention of each word in a review (the input text).
The code used for this model…

Okorimi Manoury
- 114
- 1
- 11
2
votes
1 answer
How to get attention weights in hierarchical model
Model :
sequence_input = Input(shape=(MAX_SENT_LENGTH,), dtype='int32')
words = embedding_layer(sequence_input)
h_words = Bidirectional(GRU(200, return_sequences=True,dropout=0.2,recurrent_dropout=0.2))(words)
sentence = Attention()(h_words) #with…

Rohit Saxena
- 31
- 4
2
votes
1 answer
Attention on top of LSTM Keras
I was training an LSTM Model using Keras and wanted to add Attention on top of it. I am new to Keras, and Attention. From link How to add an attention mechanism in keras? I learnt how I could add attention over my LSTM Layer and made a model like…

hiteshn97
- 90
- 2
- 9
2
votes
1 answer
How to use previous output and hidden states from LSTM for the attention mechanism?
I am currently trying to code the attention mechanism from this paper: "Effective Approaches to Attention-based Neural Machine Translation", Luong, Pham, Manning (2015). (I use global attention with the dot score).
However, I am unsure on how to…

Tom
- 275
- 2
- 16
2
votes
0 answers
Using Attention-OCR model (tensorflow/research) for extracting specific information from scanned documents
I have a few questions regarding the Attention-OCR model described in this paper: https://arxiv.org/pdf/1704.03549.pdf
Some context
My goal is to let Attention-OCR learn where to look for and read a specific information in a scanned document. It…

Filip Dziuba
- 21
- 1
- 6
2
votes
1 answer
Multiple issues with axes while implementing a Seq2Seq with attention in CNTK
I'm trying to implement a Seq2Seq model with attention in CNTK, something very similar to CNTK Tutorial 204. However, several small differences lead to various issues and error messages, which I don't understand. There are many questions here, which…

Skiminok
- 2,801
- 1
- 24
- 29
2
votes
0 answers
Keras ValueError: expected ndim=3, found ndim=4
In my Keras model, I am using a TimeDistributed wrapper, but I keep getting a shape mismatch error. Here are the layers:
r_input = Input(shape=(100,), dtype='int32')
embedded_sequences = embedding_layer(r_input)
r_lstm = Bidirectional(GRU(100,…

bear
- 663
- 1
- 14
- 33
1
vote
1 answer
How to replace this naive code with scaled_dot_product_attention() in Pytorch?
Consider a code fragment from Crossformer:
def forward(self, queries, keys, values):
B, L, H, E = queries.shape
_, S, _, D = values.shape
scale = self.scale or 1./sqrt(E)
scores = torch.einsum("blhe,bshe->bhls", queries, keys)
A…

Serge Rogatch
- 13,865
- 7
- 86
- 158
1
vote
0 answers
Hugging Face translation model cross attention layers problem, inconsistent with research
When I'm inspecting the cross-attention layers from the pretrained transformer translation model (MarianMT model), It is very strange that the cross attention from layer 0 and 1 provide best alignment between input and output. I used bertviz to…

Ayw
- 11
- 3
1
vote
0 answers
Questions about masks of padding in GPT
The GPT series models use the decoder of Transformer, with unidirectional attention. In the source code of GPT in Hugging Face, there is the implementation of masked attention:
self.register_buffer(
"bias",
…

LocustNymph
- 11
- 3