Questions regarding attention model mechanism in deep learning
Questions tagged [attention-model]
389 questions
2
votes
0 answers
Visualizing self attention weights for sequence addition problem with LSTM?
I am using Self Attention layer from here for a simple problem of adding all the numbers in a sequence that come before a delimiter. With training, I expect the neural network to learn which numbers to add and using Self Attention layer, I expect to…

sara_iftikhar
- 43
- 3
2
votes
2 answers
assertion failed: [Condition x == y did not hold element-wise:]
I have built a BiLSTM model with an attention layer for sentence classification task but I am getting an error that my assertion has failed due to mismatch in number of parameters. The attention layer code is here and the error is below the…

PeakyBlinder
- 1,059
- 1
- 14
- 35
2
votes
1 answer
Why W_q matrix in torch.nn.MultiheadAttention is quadratic
I am trying to implement nn.MultiheadAttention in my network. According to the docs,
embed_dim – total dimension of the model.
However, according to the source file,
embed_dim must be divisible by num_heads
and
self.q_proj_weight =…

Akim Tsvigun
- 91
- 1
- 8
2
votes
1 answer
Pytorch, get rid of a for loop when adding permutation of one vector to entries of a matrix?
I'm trying to implement this paper, and stuck with this simple step. Although this is to do with attention, the thing I'm stuck with is just how to implement a permutation of a vector added to a matrix without using for loops.
The attention scores…

Tinatim
- 143
- 6
2
votes
1 answer
Luong Style Attention Mechanism with Dot and General scoring functions in keras and tensorflow
I am trying to implement the dot product and general implementation of calculating similarity scores from encoder and decoder output and hidden states respectively in keras.
I have got the idea to do the product of…

Ayush Srivastava
- 444
- 1
- 4
- 13
2
votes
1 answer
Unable to save model architecture (bilstm + attention)
I am working on a multi-label text classification problem. I am trying to add attention mechanism with bilstm model. The attention mechanism code is taken from here. I am not able to save the model architecture and getting an error mentioned below.…

joel
- 1,156
- 3
- 15
- 42
2
votes
2 answers
Unable to import AttentionLayer in Keras (TF1.13)
I'm trying to import Attention layer for my encoder decoder model but it gives error.
from keras.layers import AttentionLayer
or
from keras.layers import Attention
following is the error
cannot import name 'AttentionLayer' from…

Crossfit_Jesus
- 53
- 4
- 18
2
votes
1 answer
LSTM + Attention Implementation with undefined timestep shape
I'm trying to implement a stacked LSTM with attention with varying timesteps. I mainly based it off of this, this, and this. These implementations, however, assume fixed timesteps. The model runs, but I'm not sure if this is doing what I think…

LogCapy
- 447
- 7
- 20
2
votes
0 answers
Max Sequence length in Seq2Seq - Attention is all you need
I have gone through the paper Attention is all you need and though I think I understood the overall idea behind what is happening, I am pretty confused with the way the input is being processed. Here are my doubts, and for simplicity, let's assume…

Kakarot
- 175
- 1
- 3
- 10
2
votes
1 answer
Why is the input size of the MultiheadAttention in Pytorch Transformer module 1536?
When using the torch.nn.modules.transformer.Transformer module/object, the first layer is the encoder.layers.0.self_attn layer that is a MultiheadAttention layer, i.e.
from torch.nn.modules.transformer import Transformer
bumblebee =…

alvas
- 115,346
- 109
- 446
- 738
2
votes
1 answer
How to use loaded LSTM attention model to make predictions on input?
I am a complete beginner in Deep Learning & Keras. I want to build a hierarchical attention network that helps to classify comments into several categories viz. toxic, severely toxic, etc. I took the code from an open repository and saved the model.…

Code231
- 21
- 1
2
votes
0 answers
Tensorflow @tf.function: AttributeError: in converted code
I created a class and defined the train_step function inside it: TF tutorial: NMT_attention
Without using the @tf.function significantly increases the training time. On defining it, I get a conversion error for the private variables declared inside…

Hackerds
- 1,195
- 2
- 16
- 34
2
votes
1 answer
How to add an attention layer (along with a Bi-LSTM layer) in keras sequential model?
I am trying to find an easy way to add an attention layer in Keras sequential model. However, I met a lot of problem in achieving that.
I am a novice for deep leanring, so I choose Keras as my beginning. My task is build a Bi-LSTM with attention…

denglizong
- 21
- 1
- 3
2
votes
1 answer
Combining CNN with attention network
Here is my attention layer
class Attention(Layer):
def __init__(self, **kwargs):
self.init = initializers.get('normal')
self.supports_masking = True
self.attention_dim = 50
super(Attention,…

Pratik.S
- 53
- 6
2
votes
0 answers
PyTorch runtime error: expected argument to have type long, but got CPUType instead
I'm new to PyTorch and going through this tutorial on the transformer model. I'm using PyCharm on Win10.
For now, I've basically just copy-pasted the example code, but I'm getting the following error:
RuntimeError: Expected tensor for argument #1…

SmthgScnng
- 21
- 1