0

I'm working with MxNet and I'm figuring out the Seq2Seq model. Let’s suppose that every batch will handle 32 sequences and every sequence will be of length 20 (timesteps). In order to create the architecture to work with seq2seq models we are going to split every sequence into two parts. The methods of splitting is very arbitrary but let’s suppose we divide in half the sequence. The first part will be named ‘encoder input’ and will, indeed, be the input to the encoder which will consist in a sequence of 10 (timesteps), clearly this input consist in N numbers of variable of length 10. Therefore, we’ll have x1, … , x10 for every encoding input sequence multiplied by the number of features which will result into the feature vector of encoding inputs Xt. Now, since the decoder output will be the second half of the sequence, what should be the decoder input? I'm setting the decoder input as the encoder input and tha model is working fairly good. That's the forward function:

def forward(self, encoder_input, *args):

    state= self.encoder.begin_state(batch_size=encoder_input.shape[0], ctx=mx.cpu())
    encoder_output, encoder_state= self.encoder(encoder_input, state)
    decoder_output, decoder_state= self.decoder(encoder_input, encoder_state)
    output= self.dense(decoder_output)

    return output

Is there some error with using encoder input as decoder input? I've seen some example in Keras where they initialize decoder input as an np.array with the shape of the decoder output. I've tried to set decoder input like an array of zeros but the results (in terms of accuracy) decay really badly.

BloomShell
  • 833
  • 1
  • 5
  • 20

1 Answers1

1

I've found on 'Hands on Machine Learning':

In other words, the decoder is given as input the word that it should have output at the previous step (regardless of what it actually output). For the very first word, it is given the start-of-sequence (SOS) token. The decoder is expected to end the sentence with an end-of-sequence (EOS) token.

Therefore, I suppose that if the encoder input will be composed by the first sequence of n observation for the z features, no matter what is the encoder output, we should feed the decoder with the encoder states and the decoder input which is the expected output of the encoder or in other words the sequence of the first n observation of the label. Despite all, in my analysis with python there are no evidence of better results. Maybe, feeding the decoder with only the encoder label is better when we got a lot of features.

BloomShell
  • 833
  • 1
  • 5
  • 20