How to pad sequences during training for an encoder decoder model

Question

I've got an encoder-decoder model for character level English language spelling correction, it is pretty basic stuff with a two LSTM encoder and another LSTM decoder.

However, up until now, I have been pre-padding the encoder input sequences, like below:

abc  -> -abc
defg -> defg
ad   -> --ad

And next I have been splitting the data into several groups with the same decoder input length, e.g.

train_data = {'15': [...], '16': [...], ...}

where the key is the length of the decoder input data and I have been training the model once for each length in a loop.

However, there has to be a better way to do this, such as padding after the EOS or before SOS characters etc. But if this is the case, how would I change the loss function so that this padding isn't counted into the loss?

score 0 · Accepted Answer · answered Mar 03 '20 at 09:19

The standard way of doing padding is putting it after the end-of-sequence token, but it should really matter where the padding is.

Trick how to not include the padded positions into the loss is masking them out before reducing the loss. Assuming the PAD_ID variable contains the index of the symbol that you use for padding:

def custom_loss(y_true, y_pred):
    mask = 1 - K.cast(K.equal(y_true, PAD_ID), K.floatx())
    loss = K.categorical_crossentropy(y_true, y_pred) * mask    
    return K.sum(loss) / K.sum(mask)

How to pad sequences during training for an encoder decoder model

1 Answers1