Questions tagged [attention-model]

Questions regarding attention model mechanism in deep learning

389 questions
3
votes
1 answer

Why does Keras not return the full sequence of cell state in lstm layer?

I am trying to implement an attention mechanism where I need the full sequence of the cell state (just like the full sequence of the hidden state). Keras LSTM however only returns the last cell state: output, state_h, state_c = layers.LSTM(units=45,…
bcsta
  • 1,963
  • 3
  • 22
  • 61
3
votes
2 answers

What should be the Query Q, Key K and Value V vectors/matrics in torch.nn.MultiheadAttention?

Following an amazing blog, I implemented my own self-attention module. However, I found PyTorch has already implemented a multi-head attention module. The input to the forward pass of the MultiheadAttention module includes Q (which is query vector)…
PinkBanter
  • 1,686
  • 5
  • 17
  • 38
3
votes
1 answer

Should the queries, keys and values of the transformer be split before or after being passed through the linear layers?

I have seen two different implementations of Multi-Head Attention. In one of the approaches the queries, keys and values are split into heads before being passed through the linear layers as shown below: def split_heads(self, x, batch_size): …
Kinyugo
  • 429
  • 1
  • 4
  • 11
3
votes
1 answer

Loading pre trained Attention model in keras custom_objects

I am loading a pretrained attention model in Keras using load_model() . My Attention class is defined as below. # attention class from keras.engine.topology import Layer from keras import initializers, regularizers, constraints from keras import…
der_radler
  • 549
  • 1
  • 6
  • 17
3
votes
2 answers

What do input layers represent in a Hierarchical Attention Network

I'm trying to grasp the idea of a Hierarchical Attention Network (HAN), most of the code i find online is more or less similar to the one here: https://medium.com/jatana/report-on-text-classification-using-cnn-rnn-han-f0e887214d5f…
amrnablus
  • 237
  • 1
  • 3
  • 12
3
votes
0 answers

Implementing a simple attention mechanism in Keras

I want to implement a simple attention mechanism to ensemble the results of a CNN model. Concretely, each example of my input is a sequences of images, so each example has shape [None, img_width, img_height, n_channels]. Using a TimeDistributed…
Jsevillamol
  • 2,425
  • 2
  • 23
  • 46
3
votes
1 answer

Pytorch softmax along different masks without for loop

Say I have a vector a , with an index vector b of the same length. The indexs are in range 0~N-1, corresponding to N groups. How can I do softmax for every group without for loop? I'm doing some sort of attention operation here. The numbers for…
3
votes
0 answers

Exhaustive Concatenation between the tensors

I am trying to do the exhaustive concatenation between the tensors. So, for example, I have tensor: a = torch.randn(3, 512) I want to concatenate like concat(t1,t1),concat(t1,t2), concat(t1,t3), concat(t2,t1), concat(t2,t2).... As a naive…
amy
  • 342
  • 1
  • 5
  • 18
3
votes
1 answer

Self-Attention GAN in Keras

I'm currently considering to implement the Self-Attention GAN in keras. The way I'm thinking to implement is as follows: def Attention(X, channels): def hw_flatten(x): return np.reshape(x, (x.shape[0], -1, x.shape[-1])) f =…
3
votes
2 answers

LSTM with Attention

I am trying to add attention mechanism to stacked LSTMs implementation https://github.com/salesforce/awd-lstm-lm All examples online use encoder-decoder architecture, which I do not want to use (do I have to for the attention mechanism?). Basically,…
3
votes
1 answer

What does the "source hidden state" refer to in the Attention Mechanism?

The attention weights are computed as: I want to know what the h_s refers to. In the tensorflow code, the encoder RNN returns a tuple: encoder_outputs, encoder_state = tf.nn.dynamic_rnn(...) As I think, the h_s should be the encoder_state, but the…
3
votes
1 answer

How to use the output of attention wrapper applied over LSTM as an input to the TimeDistributed layer, Keras?

I have been trying to implement an attention wrapper over the output of the LSTM model shown in this machinelearningmastery tutorial: from numpy import array from keras.models import Sequential from keras.layers import Dense from keras.layers import…
Saurav--
  • 1,530
  • 2
  • 15
  • 33
3
votes
1 answer

How to modify the Tensorflow Sequence2Sequence model to implement Bidirectional LSTM rather than Unidirectional one?

Refer to this post to know the background of the problem: Does the TensorFlow embedding_attention_seq2seq method implement a bidirectional RNN Encoder by default? I am working on the same model, and want to replace the unidirectional LSTM layer with…
3
votes
1 answer

Attention mechanism for sequence classification (seq2seq tensorflow r1.1)

I am trying to build a bidirectional RNN with attention mechanism for sequence classification. I am having some issues understanding the helper function. I have seen that the one used for training needs the decoder inputs, but as I want a single…
2
votes
0 answers

In the sequential recommendation model TiSASRec, the results of the baseline model SASRec are inconsisent with the actual?

I am a novice in the recommender system. Recently, I was reading a paper related to sequential recommendation. In the process of running the official sample code of TiSASRec, I used the dataset given in github repo by removing the ratings and…