0

Why do we reverse input when feeding in seq2seq model in tensorflow ( tf.reverse(inputs,[-1]))

training_predictions,test_predictions=seq2seq_model(tf.reverse(inputs,[-1]),
                                                    targets,
                                                    keep_prob,
                                                    batch_size,
                                                    seq_length,
                                                    len(answerswords2int),
                                                    len(questionswords2int),
                                                    encoding_embedding_size,
                                                    decoding_embedding_size,
                                                    rnn_size,
                                                    num_layers,
                                                    questionswords2int)
pushkin
  • 9,575
  • 15
  • 51
  • 95

2 Answers2

1

To best of my knowledge, reversing the input arose from the paper Sequence to sequence learning with neural networks

The idea is originated for machine translation (I'm not sure how it plays out in other domains, e.g. chatbots). Think of the following scenario (borrowed from the original paper). You want to translate,

A B C -> alpha beta gamma delta

In this setting, we have to go through the full source sequence (ABC) before starting to predict alpha, where the translator might have forgotten about A by then. But when you do this as,

C B A -> alpha beta gamma delta

You have a strong communication link from A to alpha, where A is "probably" related to alpha in the translation.

Note: This entirely depends on your translation task. If the target language is written in the reverse order of the source language (e.g. think of translating from subject-verb-object to object-verb-subject language) to , I think it's better to keep the original order.

thushv89
  • 10,865
  • 1
  • 26
  • 39
0

While the LSTM is capable of solving problems with long term dependencies, we discovered that the LSTM learns much better when the source sentences are reversed (the target sentences are not reversed). By doing so, the LSTM’s test perplexity dropped from 5.8 to 4.7, and the test BLEU scores of its decoded translations increased from 25.9 to 30.6.

While we do not have a complete explanation to this phenomenon, we believe that it is caused by the introduction of many short term dependencies to the dataset. Normally, when we concatenate a source sentence with a target sentence, each word in the source sentence is far from its corresponding word in the target sentence. As a result, the problem has a large “minimal time lag” [17]. By reversing the words in the source sentence, the average distance between corresponding words in the source and target language is unchanged. However, the first few words in the source language are now very close to the first few words in the target language, so the problem’s minimal time lag is greatly reduced. Thus, backpropagation has an easier time “establishing communication” between the source sentence and the target sentence, which in turn results in substantially improved overall performance.

Initially, we believed that reversing the input sentences would only lead to more confident predic- tions in the early parts of the target sentence and to less confident predictions in the later parts. How- ever, LSTMs trained on reversed source sentences did much better on long sentences than LSTMs rained on the raw source sentences.

Paper: https://arxiv.org/abs/1409.3215