Questions tagged [attention-model]

Questions regarding attention model mechanism in deep learning

389 questions
0
votes
1 answer

Why an output of attention decoder need to be combined with attention

legacy_seq2seq in tensorflow x = linear([inp] + attns, input_size, True) # Run the RNN. cell_output, state = cell(x, state) # Run the attention mechanism. if i == 0 and initial_state_attention: with…
Yutao ZHU
  • 88
  • 5
0
votes
1 answer

Tensorflow attention OCR inference code

I am trying to run the attention ocr in the tensorflow models https://github.com/tensorflow/models/tree/master/attention_ocr . I can find the script for training and evaluating on the FSNS dataset but they do not have code to run inference on a…
0
votes
1 answer

Training Method Choice for seq2seq model

What kind of training method you may recommend for training an attention based sequence to sequence neural machine translation model? SGD, Adadelta, Adam or something better? Please give some advice, thanks.
0
votes
1 answer

Multiply matrix with other matrix of different shapes in keras backend

I'm trying to implement an attention model based this model but I want my model to not just look one frame to decide the attention for that frame, I want a model that will try to look at the frame in respect to the whole sequence. So what I'm doing…
m1sk
  • 308
  • 2
  • 10
0
votes
1 answer

Extracting attention matrix with TensorFlow's seq2seq example code during decoding

It seems like the attention() method used to compute the attention mask in the seq2seq_model.py code in the example TensorFlow code for the sequence-to-sequence code is not called during decoding. Does anyone know how to resolve this? A similar…
-1
votes
0 answers

Simple RNN with Attention mechanism vs LSTM without attention mechanism

Simple RNN with Attention mechanism vs LSTM without attention mechanism, who will perform better? In general, it's difficult to definitively say whether a simple RNN with an attention mechanism or an LSTM without an attention mechanism will perform…
-1
votes
0 answers

Calculating Sentence Level Attention

How do I quantify the attention between input and output sentences in a sequence-to-sequence language modelling scenario [translation or summarization]? For instance, consider these input and output statements, i.e., document is the input, and…
-1
votes
0 answers

How to use attention mechanism to learn four weights

I am a beginner in graph neural networks and I want to use attention mechanism to learn weights for four results, so that they can be weighted and summed to obtain the final result Expect to achieve attention class, learn four weights, and weighted…
-1
votes
1 answer

Defining dimension of NMT and image captioning with attention at the decoder part

I have been checking out models with attention in those tutorials below. https://www.tensorflow.org/tutorials/text/nmt_with_attention and https://www.tensorflow.org/tutorials/text/image_captioning In both tutorials, I do not understand the defining…
-1
votes
1 answer

Adding softmax significantly changes weight updates

I have a neural network of the form N = W1 * Tanh(W2 * I), where I is the Input Vector/Matrix. When I learn these weights the output has a certain form. However, when I add a normalization layer, for example, N' = Softmax( W1 * Tanh(W2 * I) )…
Rumu
  • 403
  • 1
  • 3
  • 10
-2
votes
1 answer

Feeding an image to stacked resnet blocks to create an embedding

Do you have any code example or paper that refers to something like the following diagram? I want to know why we want to stack multiple resnet blocks as opposed to multiple convolutional block as in more traditional architectures? Any code sample…
Mona Jalal
  • 34,860
  • 64
  • 239
  • 408
-2
votes
1 answer

Getting Cuda Out of Memory while running Longformer Model in Google Colab. Similar code using Bert is working fine

I am working on text classification using Longformer Model. I took even just first 100 rows of dataframe. I am getting memory error. I am using google colab. This is my model : model =…
-3
votes
0 answers

Methods for Programmatically Generating 'Attention Is All You Need' Diagrams

Is there a way to create nodes that overlaps with each other to show there there is a "stack" of its type like that in the "attention is all you need" paper (maybe using Mermaid) or any other code-based methods? An example: If this is not possible…
1 2 3
25
26