Questions tagged [attention-model]

Questions regarding attention model mechanism in deep learning

389 questions
5
votes
0 answers

Retrieving attention weights for sentences? Most attentive sentences are zero vectors

I have a document classification task, that classifies documents as good (1) or bad (0), and I use some sentence embeddings for each document to classify the documents accordingly. What I like to do is retrieving the attention scores for each…
Felix
  • 313
  • 1
  • 3
  • 22
5
votes
2 answers

Why use multi-headed attention in Transformers?

I am trying to understand why transformers use multiple attention heads. I found the following quote: Instead of using a single attention function where the attention can be dominated by the actual word itself, transformers use multiple attention…
SomeDutchGuy
  • 2,249
  • 4
  • 16
  • 42
5
votes
2 answers

Is there any way to convert pytorch tensor to tensorflow tensor

https://github.com/taoshen58/BiBloSA/blob/ec67cbdc411278dd29e8888e9fd6451695efc26c/context_fusion/self_attn.py#L29 I need to use mulit_dimensional_attention from the above link which is implemented in TensorFlow but I am using PyTorch so can I…
waleed hamid
  • 51
  • 1
  • 2
  • 5
5
votes
1 answer

Is there a way to use the native tf Attention layer with keras Sequential API?

Is there a way to use the native tf Attention layer with keras Sequential API? I'm looking to use this particular class. I have found custom implementations such as this one. What I'm truly looking for is the use of this particular class with the…
5
votes
1 answer

Differences between different attention layers for Keras

I am trying to add an attention layer for my text classification model. The inputs are texts (e.g. movie review), the output is a binary outcome (e.g. positive vs negative). model = Sequential() model.add(Embedding(max_features, 32,…
5
votes
1 answer

Cannot parse GraphDef file in function 'ReadTFNetParamsFromTextFileOrDie' in OpenCV-DNN TensorFlow

I want to wrap the attention-OCR model with OpenCV-DNN to increase inference time. I am using the TF code from the official TF models repo. For wrapping TF model with OpenCV-DNN, I am referring to this code. The cv2.dnn.readNetFromTensorflow()…
Chintan
  • 454
  • 6
  • 15
5
votes
0 answers

how to access the attention weights from the attention class

class AttLayer(Layer): def __init__(self, **kwargs): self.init = initializations.get('normal') #self.input_spec = [InputSpec(ndim=3)] super(AttLayer, self).__init__(** kwargs) def build(self, input_shape): …
4
votes
1 answer

tf.keras.layers.MultiHeadAttention's argument key_dim sometimes not matches to paper's example

For example, I have input with shape (1, 1000, 10) (so, src.shape wil be (1, 1000, 10), which means the sequence length is 1000, and the dimension is 10. Then: This works (random num_head and key_dim): class Model(tf.keras.Model): def…
4
votes
1 answer

Multi-Head attention layers - what is a warpper multi-head layer in Keras?

I am new to attention mechanisms and I want to learn more about it by doing some practical examples. I came across a Keras implementation for multi-head attention found it in this website Pypi keras multi-head. I found two different ways to…
4
votes
1 answer

Why is my attention model worse than non-attention model

My task was to convert english sentence to German sentence. I first did this with normal encoder-decoder network, on which I got fairly good results. Then, I tried to solve the same task with the same exact model as before, but with Bahdanau…
user14349917
4
votes
3 answers

Can't set the attribute "trainable_weights", likely because it conflicts with an existing read-only

My code was running perfectly in colab. But today it's not running. It says Can't set the attribute "trainable_weights", likely because it conflicts with an existing read-only @property of the object. Please choose a different name. I am using LSTM…
4
votes
1 answer

TransformerEncoder with a padding mask

I'm trying to implement torch.nn.TransformerEncoder with a src_key_padding_mask not equal to none. Imagine the input is of the shape src = [20, 95] and the binary padding mask has the shape src_mask = [20, 95], 1 in the position of padded tokens and…
4
votes
1 answer

Implementation details of positional encoding in transformer model?

How exactly does this positional encoding being calculated? Let's assume a machine translation scenario and these are input sentences, english_text = [this is good, this is bad] german_text = [das ist gut, das ist schlecht] Now our input vocabulary…
4
votes
1 answer

How do attention network works?

Recently I was going through Attention is all you need paper, ongoing through it I found an issue regarding understanding the attention network if I ignore the maths behind it. Can anyone make me understand the attention network with an example?
Kumar Mangalam
  • 748
  • 7
  • 12
4
votes
0 answers

How to use tfa.seq2seq.BahdanauAttention with tf.keras functional API?

I want to use tfa.seq2seq.BahdanauAttention with functional API of tf.keras. I have looked at the example given at tensorflow/nmt/attention_model.py. But I couldn't figure out how to use it with tf.keras's functional API. So I would like to use…
1 2
3
25 26