0

Are there any successful application of deep seq2seq model where the decoder read ONLY the encoder's output state (final step of encoder's internal state) at its first step, and carry out multiple steps decoding?

I.e. no peeking, no attention etc. At each step the decoder's input is only the previous step's output and state.

I could see a few seq2seq autoencoder implementation, wonder if they really converge after a long time of training, especially when the internal state is small.

1 Answers1

1

Using only the last hidden state without attention has insufficient representation power, especially when the hidden size is small. A few systems prior the invention of attention are

https://arxiv.org/abs/1409.3215

https://arxiv.org/abs/1506.05869

Peixiang Zhong
  • 419
  • 4
  • 8