Questions regarding attention model mechanism in deep learning
Questions tagged [attention-model]
389 questions
3
votes
1 answer
Why does Keras not return the full sequence of cell state in lstm layer?
I am trying to implement an attention mechanism where I need the full sequence of the cell state (just like the full sequence of the hidden state). Keras LSTM however only returns the last cell state:
output, state_h, state_c = layers.LSTM(units=45,…

bcsta
- 1,963
- 3
- 22
- 61
3
votes
2 answers
What should be the Query Q, Key K and Value V vectors/matrics in torch.nn.MultiheadAttention?
Following an amazing blog, I implemented my own self-attention module. However, I found PyTorch has already implemented a multi-head attention module. The input to the forward pass of the MultiheadAttention module includes Q (which is query vector)…

PinkBanter
- 1,686
- 5
- 17
- 38
3
votes
1 answer
Should the queries, keys and values of the transformer be split before or after being passed through the linear layers?
I have seen two different implementations of Multi-Head Attention.
In one of the approaches the queries, keys and values are split into heads before being passed through the linear layers as shown below:
def split_heads(self, x, batch_size):
…

Kinyugo
- 429
- 1
- 4
- 11
3
votes
1 answer
Loading pre trained Attention model in keras custom_objects
I am loading a pretrained attention model in Keras using load_model() .
My Attention class is defined as below.
# attention class
from keras.engine.topology import Layer
from keras import initializers, regularizers, constraints
from keras import…

der_radler
- 549
- 1
- 6
- 17
3
votes
2 answers
What do input layers represent in a Hierarchical Attention Network
I'm trying to grasp the idea of a Hierarchical Attention Network (HAN), most of the code i find online is more or less similar to the one here: https://medium.com/jatana/report-on-text-classification-using-cnn-rnn-han-f0e887214d5f…

amrnablus
- 237
- 1
- 3
- 12
3
votes
0 answers
Implementing a simple attention mechanism in Keras
I want to implement a simple attention mechanism to ensemble the results of a CNN model.
Concretely, each example of my input is a sequences of images, so each example has shape [None, img_width, img_height, n_channels].
Using a TimeDistributed…

Jsevillamol
- 2,425
- 2
- 23
- 46
3
votes
1 answer
Pytorch softmax along different masks without for loop
Say I have a vector a , with an index vector b of the same length. The indexs are in range 0~N-1, corresponding to N groups. How can I do softmax for every group without for loop?
I'm doing some sort of attention operation here. The numbers for…

Zhang Yu
- 559
- 6
- 15
3
votes
0 answers
Exhaustive Concatenation between the tensors
I am trying to do the exhaustive concatenation between the tensors. So, for example,
I have tensor:
a = torch.randn(3, 512)
I want to concatenate like
concat(t1,t1),concat(t1,t2), concat(t1,t3), concat(t2,t1), concat(t2,t2)....
As a naive…

amy
- 342
- 1
- 5
- 18
3
votes
1 answer
Self-Attention GAN in Keras
I'm currently considering to implement the Self-Attention GAN in keras.
The way I'm thinking to implement is as follows:
def Attention(X, channels):
def hw_flatten(x):
return np.reshape(x, (x.shape[0], -1, x.shape[-1]))
f =…

Hao Chen
- 174
- 1
- 4
- 13
3
votes
2 answers
LSTM with Attention
I am trying to add attention mechanism to stacked LSTMs implementation https://github.com/salesforce/awd-lstm-lm
All examples online use encoder-decoder architecture, which I do not want to use (do I have to for the attention mechanism?).
Basically,…

Boris Mocialov
- 3,439
- 2
- 28
- 55
3
votes
1 answer
What does the "source hidden state" refer to in the Attention Mechanism?
The attention weights are computed as:
I want to know what the h_s refers to.
In the tensorflow code, the encoder RNN returns a tuple:
encoder_outputs, encoder_state = tf.nn.dynamic_rnn(...)
As I think, the h_s should be the encoder_state, but the…

imhuay
- 271
- 1
- 2
- 11
3
votes
1 answer
How to use the output of attention wrapper applied over LSTM as an input to the TimeDistributed layer, Keras?
I have been trying to implement an attention wrapper over the output of the LSTM model shown in this machinelearningmastery tutorial:
from numpy import array
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import…

Saurav--
- 1,530
- 2
- 15
- 33
3
votes
1 answer
How to modify the Tensorflow Sequence2Sequence model to implement Bidirectional LSTM rather than Unidirectional one?
Refer to this post to know the background of the problem:
Does the TensorFlow embedding_attention_seq2seq method implement a bidirectional RNN Encoder by default?
I am working on the same model, and want to replace the unidirectional LSTM layer with…

Leena Shekhar
- 31
- 3
3
votes
1 answer
Attention mechanism for sequence classification (seq2seq tensorflow r1.1)
I am trying to build a bidirectional RNN with attention mechanism for sequence classification. I am having some issues understanding the helper function. I have seen that the one used for training needs the decoder inputs, but as I want a single…

JJChickpeaboy
- 55
- 1
- 5
2
votes
0 answers
In the sequential recommendation model TiSASRec, the results of the baseline model SASRec are inconsisent with the actual?
I am a novice in the recommender system. Recently, I was reading a paper related to sequential recommendation. In the process of running the official sample code of TiSASRec, I used the dataset given in github repo by removing the ratings and…

jie Zhou
- 21
- 2