Though I am not sure if the stackoverflow is a right place to ask a theoritical question which is not directly related to a programming topic, I will just post my question.
One of motivations in developing the Transformer was to achieve parallelism by replacing recurrent operations with a fully connected feed-forward network.
However, it still entails a sequential behavior as it generates an output symbol at each step in resorting to previously generated symbols. That means the Transformer is an auto-regressive model when it comes to decoding internal representations and producing output symbols.
Is it possible to make the decoding process parallel by completing all of output symbols simultaneously?
It would be very helpful if one can refer to a research related to this topic. Thanks in advance.