I've got an encoder-decoder model for character level English language spelling correction, it is pretty basic stuff with a two LSTM encoder and another LSTM decoder.
However, up until now, I have been pre-padding the encoder input sequences, like below:
abc -> -abc
defg -> defg
ad -> --ad
And next I have been splitting the data into several groups with the same decoder input length, e.g.
train_data = {'15': [...], '16': [...], ...}
where the key is the length of the decoder input data and I have been training the model once for each length in a loop.
However, there has to be a better way to do this, such as padding after the EOS or before SOS characters etc. But if this is the case, how would I change the loss function so that this padding isn't counted into the loss?