I'm working with MxNet and I'm figuring out the Seq2Seq model. Let’s suppose that every batch will handle 32 sequences and every sequence will be of length 20 (timesteps). In order to create the architecture to work with seq2seq models we are going to split every sequence into two parts. The methods of splitting is very arbitrary but let’s suppose we divide in half the sequence. The first part will be named ‘encoder input’ and will, indeed, be the input to the encoder which will consist in a sequence of 10 (timesteps), clearly this input consist in N numbers of variable of length 10. Therefore, we’ll have x1, … , x10 for every encoding input sequence multiplied by the number of features which will result into the feature vector of encoding inputs Xt. Now, since the decoder output will be the second half of the sequence, what should be the decoder input? I'm setting the decoder input as the encoder input and tha model is working fairly good. That's the forward function:
def forward(self, encoder_input, *args):
state= self.encoder.begin_state(batch_size=encoder_input.shape[0], ctx=mx.cpu())
encoder_output, encoder_state= self.encoder(encoder_input, state)
decoder_output, decoder_state= self.decoder(encoder_input, encoder_state)
output= self.dense(decoder_output)
return output
Is there some error with using encoder input as decoder input? I've seen some example in Keras where they initialize decoder input as an np.array with the shape of the decoder output. I've tried to set decoder input like an array of zeros but the results (in terms of accuracy) decay really badly.