Questions tagged [attention-model]

Questions regarding attention model mechanism in deep learning

389 questions
2
votes
0 answers

Visualizing self attention weights for sequence addition problem with LSTM?

I am using Self Attention layer from here for a simple problem of adding all the numbers in a sequence that come before a delimiter. With training, I expect the neural network to learn which numbers to add and using Self Attention layer, I expect to…
2
votes
2 answers

assertion failed: [Condition x == y did not hold element-wise:]

I have built a BiLSTM model with an attention layer for sentence classification task but I am getting an error that my assertion has failed due to mismatch in number of parameters. The attention layer code is here and the error is below the…
PeakyBlinder
  • 1,059
  • 1
  • 14
  • 35
2
votes
1 answer

Why W_q matrix in torch.nn.MultiheadAttention is quadratic

I am trying to implement nn.MultiheadAttention in my network. According to the docs, embed_dim  – total dimension of the model. However, according to the source file, embed_dim must be divisible by num_heads and self.q_proj_weight =…
Akim Tsvigun
  • 91
  • 1
  • 8
2
votes
1 answer

Pytorch, get rid of a for loop when adding permutation of one vector to entries of a matrix?

I'm trying to implement this paper, and stuck with this simple step. Although this is to do with attention, the thing I'm stuck with is just how to implement a permutation of a vector added to a matrix without using for loops. The attention scores…
Tinatim
  • 143
  • 6
2
votes
1 answer

Luong Style Attention Mechanism with Dot and General scoring functions in keras and tensorflow

I am trying to implement the dot product and general implementation of calculating similarity scores from encoder and decoder output and hidden states respectively in keras. I have got the idea to do the product of…
Ayush Srivastava
  • 444
  • 1
  • 4
  • 13
2
votes
1 answer

Unable to save model architecture (bilstm + attention)

I am working on a multi-label text classification problem. I am trying to add attention mechanism with bilstm model. The attention mechanism code is taken from here. I am not able to save the model architecture and getting an error mentioned below.…
joel
  • 1,156
  • 3
  • 15
  • 42
2
votes
2 answers

Unable to import AttentionLayer in Keras (TF1.13)

I'm trying to import Attention layer for my encoder decoder model but it gives error. from keras.layers import AttentionLayer or from keras.layers import Attention following is the error cannot import name 'AttentionLayer' from…
Crossfit_Jesus
  • 53
  • 4
  • 18
2
votes
1 answer

LSTM + Attention Implementation with undefined timestep shape

I'm trying to implement a stacked LSTM with attention with varying timesteps. I mainly based it off of this, this, and this. These implementations, however, assume fixed timesteps. The model runs, but I'm not sure if this is doing what I think…
LogCapy
  • 447
  • 7
  • 20
2
votes
0 answers

Max Sequence length in Seq2Seq - Attention is all you need

I have gone through the paper Attention is all you need and though I think I understood the overall idea behind what is happening, I am pretty confused with the way the input is being processed. Here are my doubts, and for simplicity, let's assume…
Kakarot
  • 175
  • 1
  • 3
  • 10
2
votes
1 answer

Why is the input size of the MultiheadAttention in Pytorch Transformer module 1536?

When using the torch.nn.modules.transformer.Transformer module/object, the first layer is the encoder.layers.0.self_attn layer that is a MultiheadAttention layer, i.e. from torch.nn.modules.transformer import Transformer bumblebee =…
alvas
  • 115,346
  • 109
  • 446
  • 738
2
votes
1 answer

How to use loaded LSTM attention model to make predictions on input?

I am a complete beginner in Deep Learning & Keras. I want to build a hierarchical attention network that helps to classify comments into several categories viz. toxic, severely toxic, etc. I took the code from an open repository and saved the model.…
2
votes
0 answers

Tensorflow @tf.function: AttributeError: in converted code

I created a class and defined the train_step function inside it: TF tutorial: NMT_attention Without using the @tf.function significantly increases the training time. On defining it, I get a conversion error for the private variables declared inside…
Hackerds
  • 1,195
  • 2
  • 16
  • 34
2
votes
1 answer

How to add an attention layer (along with a Bi-LSTM layer) in keras sequential model?

I am trying to find an easy way to add an attention layer in Keras sequential model. However, I met a lot of problem in achieving that. I am a novice for deep leanring, so I choose Keras as my beginning. My task is build a Bi-LSTM with attention…
denglizong
  • 21
  • 1
  • 3
2
votes
1 answer

Combining CNN with attention network

Here is my attention layer class Attention(Layer): def __init__(self, **kwargs): self.init = initializers.get('normal') self.supports_masking = True self.attention_dim = 50 super(Attention,…
2
votes
0 answers

PyTorch runtime error: expected argument to have type long, but got CPUType instead

I'm new to PyTorch and going through this tutorial on the transformer model. I'm using PyCharm on Win10. For now, I've basically just copy-pasted the example code, but I'm getting the following error: RuntimeError: Expected tensor for argument #1…