Questions regarding attention model mechanism in deep learning
Questions tagged [attention-model]
389 questions
0
votes
1 answer
Why an output of attention decoder need to be combined with attention
legacy_seq2seq in tensorflow
x = linear([inp] + attns, input_size, True)
# Run the RNN.
cell_output, state = cell(x, state)
# Run the attention mechanism.
if i == 0 and initial_state_attention:
with…

Yutao ZHU
- 88
- 5
0
votes
1 answer
Tensorflow attention OCR inference code
I am trying to run the attention ocr in the tensorflow models https://github.com/tensorflow/models/tree/master/attention_ocr . I can find the script for training and evaluating on the FSNS dataset but they do not have code to run inference on a…

Shailesh Acharya
- 1
- 3
0
votes
1 answer
Training Method Choice for seq2seq model
What kind of training method you may recommend for training an attention based sequence to sequence neural machine translation model? SGD, Adadelta, Adam or something better? Please give some advice, thanks.

陶恺大天才
- 295
- 1
- 3
- 7
0
votes
1 answer
Multiply matrix with other matrix of different shapes in keras backend
I'm trying to implement an attention model based this model
but I want my model to not just look one frame to decide the attention for that frame, I want a model that will try to look at the frame in respect to the whole sequence. So what I'm doing…

m1sk
- 308
- 2
- 10
0
votes
1 answer
Extracting attention matrix with TensorFlow's seq2seq example code during decoding
It seems like the attention() method used to compute the attention mask in the seq2seq_model.py code in the example TensorFlow code for the sequence-to-sequence code is not called during decoding.
Does anyone know how to resolve this? A similar…

EXeLicA
- 1
- 1
-1
votes
0 answers
Simple RNN with Attention mechanism vs LSTM without attention mechanism
Simple RNN with Attention mechanism vs LSTM without attention mechanism, who will perform better?
In general, it's difficult to definitively say whether a simple RNN with an attention mechanism or an LSTM without an attention mechanism will perform…

Aditya Jindal
- 1
- 2
-1
votes
0 answers
Calculating Sentence Level Attention
How do I quantify the attention between input and output sentences in a sequence-to-sequence language modelling scenario [translation or summarization]?
For instance, consider these input and output statements, i.e., document is the input, and…
-1
votes
0 answers
How to use attention mechanism to learn four weights
I am a beginner in graph neural networks and I want to use attention mechanism to learn weights for four results, so that they can be weighted and summed to obtain the final result
Expect to achieve attention class, learn four weights, and weighted…
-1
votes
1 answer
Defining dimension of NMT and image captioning with attention at the decoder part
I have been checking out models with attention in those tutorials below.
https://www.tensorflow.org/tutorials/text/nmt_with_attention
and
https://www.tensorflow.org/tutorials/text/image_captioning
In both tutorials, I do not understand the defining…

Jun
- 3
- 1
-1
votes
1 answer
Adding softmax significantly changes weight updates
I have a neural network of the form N = W1 * Tanh(W2 * I), where I is the Input Vector/Matrix. When I learn these weights the output has a certain form. However, when I add a normalization layer, for example, N' = Softmax( W1 * Tanh(W2 * I) )…

Rumu
- 403
- 1
- 3
- 10
-2
votes
1 answer
Feeding an image to stacked resnet blocks to create an embedding
Do you have any code example or paper that refers to something like the following diagram?
I want to know why we want to stack multiple resnet blocks as opposed to multiple convolutional block as in more traditional architectures? Any code sample…

Mona Jalal
- 34,860
- 64
- 239
- 408
-2
votes
1 answer
Getting Cuda Out of Memory while running Longformer Model in Google Colab. Similar code using Bert is working fine
I am working on text classification using Longformer Model. I took even just first 100 rows of dataframe. I am getting memory error. I am using google colab.
This is my model :
model =…

Sandeep Pathania
- 1
- 1
- 3
-3
votes
0 answers
Methods for Programmatically Generating 'Attention Is All You Need' Diagrams
Is there a way to create nodes that overlaps with each other to show there there is a "stack" of its type like that in the "attention is all you need" paper (maybe using Mermaid) or any other code-based methods?
An example:
If this is not possible…

Don Yin
- 1
-3
votes
1 answer
Difference between Model(inputs=[input],outputs=[output1,output2]) and Model(inputs=[input],outputs=[output1]+output2) in KERAS?
Please check out the last line of the code

Vishal Singh
- 3
- 1