Questions regarding attention model mechanism in deep learning
Questions tagged [attention-model]
389 questions
5
votes
0 answers
Retrieving attention weights for sentences? Most attentive sentences are zero vectors
I have a document classification task, that classifies documents as good (1) or bad (0), and I use some sentence embeddings for each document to classify the documents accordingly.
What I like to do is retrieving the attention scores for each…

Felix
- 313
- 1
- 3
- 22
5
votes
2 answers
Why use multi-headed attention in Transformers?
I am trying to understand why transformers use multiple attention heads. I found the following quote:
Instead of using a single attention function where the attention can
be dominated by the actual word itself, transformers use multiple
attention…

SomeDutchGuy
- 2,249
- 4
- 16
- 42
5
votes
2 answers
Is there any way to convert pytorch tensor to tensorflow tensor
https://github.com/taoshen58/BiBloSA/blob/ec67cbdc411278dd29e8888e9fd6451695efc26c/context_fusion/self_attn.py#L29
I need to use mulit_dimensional_attention from the above link which is implemented in TensorFlow but I am using PyTorch so can I…

waleed hamid
- 51
- 1
- 2
- 5
5
votes
1 answer
Is there a way to use the native tf Attention layer with keras Sequential API?
Is there a way to use the native tf Attention layer with keras Sequential API?
I'm looking to use this particular class. I have found custom implementations such as this one. What I'm truly looking for is the use of this particular class with the…

Wajd Meskini
- 94
- 1
- 6
5
votes
1 answer
Differences between different attention layers for Keras
I am trying to add an attention layer for my text classification model. The inputs are texts (e.g. movie review), the output is a binary outcome (e.g. positive vs negative).
model = Sequential()
model.add(Embedding(max_features, 32,…

Dr. Who
- 153
- 1
- 14
5
votes
1 answer
Cannot parse GraphDef file in function 'ReadTFNetParamsFromTextFileOrDie' in OpenCV-DNN TensorFlow
I want to wrap the attention-OCR model with OpenCV-DNN to increase inference time. I am using the TF code from the official TF models repo.
For wrapping TF model with OpenCV-DNN, I am referring to this code. The cv2.dnn.readNetFromTensorflow()…

Chintan
- 454
- 6
- 15
5
votes
0 answers
how to access the attention weights from the attention class
class AttLayer(Layer):
def __init__(self, **kwargs):
self.init = initializations.get('normal')
#self.input_spec = [InputSpec(ndim=3)]
super(AttLayer, self).__init__(** kwargs)
def build(self, input_shape):
…

prashant ranjan
- 51
- 2
4
votes
1 answer
tf.keras.layers.MultiHeadAttention's argument key_dim sometimes not matches to paper's example
For example, I have input with shape (1, 1000, 10) (so, src.shape wil be (1, 1000, 10), which means the sequence length is 1000, and the dimension is 10. Then:
This works (random num_head and key_dim):
class Model(tf.keras.Model):
def…

EthanJiang
- 43
- 4
4
votes
1 answer
Multi-Head attention layers - what is a warpper multi-head layer in Keras?
I am new to attention mechanisms and I want to learn more about it by doing some practical examples. I came across a Keras implementation for multi-head attention found it in this website Pypi keras multi-head. I found two different ways to…

Amhs_11
- 233
- 3
- 10
4
votes
1 answer
Why is my attention model worse than non-attention model
My task was to convert english sentence to German sentence. I first did this with normal encoder-decoder network, on which I got fairly good results. Then, I tried to solve the same task with the same exact model as before, but with Bahdanau…
user14349917
4
votes
3 answers
Can't set the attribute "trainable_weights", likely because it conflicts with an existing read-only
My code was running perfectly in colab. But today it's not running. It says
Can't set the attribute "trainable_weights", likely because it conflicts with an existing read-only @property of the object. Please choose a different name.
I am using LSTM…

Rohan kumar Yadav
- 41
- 1
- 3
4
votes
1 answer
TransformerEncoder with a padding mask
I'm trying to implement torch.nn.TransformerEncoder with a src_key_padding_mask not equal to none. Imagine the input is of the shape src = [20, 95] and the binary padding mask has the shape src_mask = [20, 95], 1 in the position of padded tokens and…

Pourya Vakilipourtakalou
- 71
- 1
- 1
- 6
4
votes
1 answer
Implementation details of positional encoding in transformer model?
How exactly does this positional encoding being calculated?
Let's assume a machine translation scenario and these are input sentences,
english_text = [this is good, this is bad]
german_text = [das ist gut, das ist schlecht]
Now our input vocabulary…

Sai Kumar
- 665
- 2
- 9
- 21
4
votes
1 answer
How do attention network works?
Recently I was going through Attention is all you need paper, ongoing through it I found an issue regarding understanding the attention network if I ignore the maths behind it.
Can anyone make me understand the attention network with an example?

Kumar Mangalam
- 748
- 7
- 12
4
votes
0 answers
How to use tfa.seq2seq.BahdanauAttention with tf.keras functional API?
I want to use tfa.seq2seq.BahdanauAttention with functional API of tf.keras. I have looked at the example given at tensorflow/nmt/attention_model.py. But I couldn't figure out how to use it with tf.keras's functional API.
So I would like to use…

Manideep
- 41
- 5