RNN with inconsistent (repeated) padding (using Pytorch's Pack_padded_sequence)

Question

Following the example from PyTorch docs I am trying to solve a problem where the padding is inconsistent rather than at the end of the tensor for each batch (in other words, no pun intended, I have a left-censored and right-censored problem across my batches):

 # Data structure example from docs
seq = torch.tensor([[1,2,0], [3,0,0], [4,5,6]])
 # Data structure of my problem
inconsistent_seq = torch.tensor([[1,2,0], [0,3,0], [0,5,6]])

lens = ...?
packed = pack_padded_sequence(seq, lens, batch_first=True, enforce_sorted=False)

How can I solve the problem of masking these padded 0’s when running them through an LSTM using (preferably) PyTorch functionality?

score 0 · Answer 1 · answered Feb 02 '22 at 12:52

I "solved" this by essentially reindexing my data and padding left-censored data with 0's (makes sense for my problem). I also injected and extra tensor to the input dimension to track this padding. I then masked the right-censored data using the pack_padded_sequence method from the PyTorch library. Found a good source here:

https://www.kdnuggets.com/2018/06/taming-lstms-variable-sized-mini-batches-pytorch.html

RNN with inconsistent (repeated) padding (using Pytorch's Pack_padded_sequence)

1 Answers1