Questions tagged [attention-model]

Questions regarding attention model mechanism in deep learning

389 questions
2
votes
1 answer

Some parameters are not getting saved when saving a model in pytorch

I have built an encoder-decoder model with attention for morph inflection generation. I am able to train the model and predict on test data but I am getting wrong predicting after loading a saved model I am not getting any error during saving or…
Umang Jain
  • 21
  • 5
2
votes
1 answer

Attention layer output shape issue

I have been using BiLSTMs to classify each word in sentences and my input is n_sentences, max_sequence_length, classes. Recently, I have been trying to use this attention layer:…
D. Clem
  • 85
  • 1
  • 6
2
votes
0 answers

Graph disconnect in inference in Keras RNN + Encoder/Decoder + Attention

I've successfully trained a model in Keras using an encoder/decoder structure + attention + glove following several examples, most notably this one and this one. It's based on a modification of machine translation. This is a chatbot, so the input is…
2
votes
2 answers

Is attention mechanism really attention or just looking back at memory again?

When reading attention mechanism, I am confusing about the term attention. Is it the same with our attention nature as described in it usual definition?
Giang Nguyen
  • 450
  • 8
  • 17
2
votes
2 answers

What is used to train a self-attention mechanism?

I've been trying to understand self-attention, but everything I found doesn't explain the concept on a high level very well. Let's say we use self-attention in a NLP task, so our input is a sentence. Then self-attention can be used to measure how…
2
votes
4 answers

Keras: How to display attention weights in LSTM model

I made a text classification model using an LSTM with attention layer. I did my model well, it works well, but I can't display the attention weights and the importance/attention of each word in a review (the input text). The code used for this model…
2
votes
1 answer

How to get attention weights in hierarchical model

Model : sequence_input = Input(shape=(MAX_SENT_LENGTH,), dtype='int32') words = embedding_layer(sequence_input) h_words = Bidirectional(GRU(200, return_sequences=True,dropout=0.2,recurrent_dropout=0.2))(words) sentence = Attention()(h_words) #with…
2
votes
1 answer

Attention on top of LSTM Keras

I was training an LSTM Model using Keras and wanted to add Attention on top of it. I am new to Keras, and Attention. From link How to add an attention mechanism in keras? I learnt how I could add attention over my LSTM Layer and made a model like…
hiteshn97
  • 90
  • 2
  • 9
2
votes
1 answer

How to use previous output and hidden states from LSTM for the attention mechanism?

I am currently trying to code the attention mechanism from this paper: "Effective Approaches to Attention-based Neural Machine Translation", Luong, Pham, Manning (2015). (I use global attention with the dot score). However, I am unsure on how to…
2
votes
0 answers

Using Attention-OCR model (tensorflow/research) for extracting specific information from scanned documents

I have a few questions regarding the Attention-OCR model described in this paper: https://arxiv.org/pdf/1704.03549.pdf Some context My goal is to let Attention-OCR learn where to look for and read a specific information in a scanned document. It…
Filip Dziuba
  • 21
  • 1
  • 6
2
votes
1 answer

Multiple issues with axes while implementing a Seq2Seq with attention in CNTK

I'm trying to implement a Seq2Seq model with attention in CNTK, something very similar to CNTK Tutorial 204. However, several small differences lead to various issues and error messages, which I don't understand. There are many questions here, which…
Skiminok
  • 2,801
  • 1
  • 24
  • 29
2
votes
0 answers

Keras ValueError: expected ndim=3, found ndim=4

In my Keras model, I am using a TimeDistributed wrapper, but I keep getting a shape mismatch error. Here are the layers: r_input = Input(shape=(100,), dtype='int32') embedded_sequences = embedding_layer(r_input) r_lstm = Bidirectional(GRU(100,…
bear
  • 663
  • 1
  • 14
  • 33
1
vote
1 answer

How to replace this naive code with scaled_dot_product_attention() in Pytorch?

Consider a code fragment from Crossformer: def forward(self, queries, keys, values): B, L, H, E = queries.shape _, S, _, D = values.shape scale = self.scale or 1./sqrt(E) scores = torch.einsum("blhe,bshe->bhls", queries, keys) A…
Serge Rogatch
  • 13,865
  • 7
  • 86
  • 158
1
vote
0 answers

Hugging Face translation model cross attention layers problem, inconsistent with research

When I'm inspecting the cross-attention layers from the pretrained transformer translation model (MarianMT model), It is very strange that the cross attention from layer 0 and 1 provide best alignment between input and output. I used bertviz to…
1
vote
0 answers

Questions about masks of padding in GPT

The GPT series models use the decoder of Transformer, with unidirectional attention. In the source code of GPT in Hugging Face, there is the implementation of masked attention: self.register_buffer( "bias", …