Are there any successful application of deep seq2seq model where the decoder read ONLY the encoder's output state (final step of encoder's internal state) at its first step, and carry out multiple steps decoding?
I.e. no peeking, no attention etc. At each step the decoder's input is only the previous step's output and state.
I could see a few seq2seq autoencoder implementation, wonder if they really converge after a long time of training, especially when the internal state is small.