Pytorch inconsistent size with pad_packed_sequence, seq2seq

Question

I'm having some inconsistencies with the output of a encoder I got from this github .

The encoder looks as follows:

class Encoder(nn.Module):
    r"""Applies a multi-layer LSTM to an variable length input sequence.
    """

    def __init__(self, input_size, hidden_size, num_layers,
                 dropout=0.0, bidirectional=True, rnn_type='lstm'):
        super(Encoder, self).__init__()
        self.input_size = 40
        self.hidden_size = 512
        self.num_layers = 8
        self.bidirectional = True
        self.rnn_type = 'lstm'
        self.dropout = 0.0
        if self.rnn_type == 'lstm':
            self.rnn = nn.LSTM(input_size, hidden_size, num_layers,
                               batch_first=True,
                               dropout=dropout,
                               bidirectional=bidirectional)

    def forward(self, padded_input, input_lengths):
        """
        Args:
            padded_input: N x T x D
            input_lengths: N
        Returns: output, hidden
            - **output**: N x T x H
            - **hidden**: (num_layers * num_directions) x N x H
        """
        total_length = padded_input.size(1)  # get the max sequence length
        packed_input = pack_padded_sequence(padded_input, input_lengths,
                                            batch_first=True,enforce_sorted=False)
        packed_output, hidden = self.rnn(packed_input)
        pdb.set_trace()
        output, _ = pad_packed_sequence(packed_output, batch_first=True, total_length=total_length)
        return output, hidden

So it only consists of a rnn lstm cell, if I print the encoder this is the output:

LSTM(40, 512, num_layers=8, batch_first=True, bidirectional=True)

So it should have a 512 sized output right? But when I feed a tensor with size torch.Size([16, 1025, 40]) 16 samples of 1025 vectors with size 40 (that gets packed to fit the RNN) the output that I get from the RNN has a new encoded size of 1024 torch.Size([16, 1025, 1024]) when it should have been encoded to 512 right?

Is there something Im missing?

score 2 · Accepted Answer · answered May 12 '20 at 11:42

Setting bidirectional=True makes the LSTM bidirectional, which means there will be two LSTMs, one that goes from left to right and the other that goes from right to left.

From the nn.LSTM documentation - Outputs:

output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the LSTM, for each t. If a torch.nn.utils.rnn.PackedSequence has been given as the input, the output will also be a packed sequence.

For the unpacked case, the directions can be separated using output.view(seq_len, batch, num_directions, hidden_size), with forward and backward being direction 0 and 1 respectively. Similarly, the directions can be separated in the packed case.

Your output has the size [batch, seq_len, 2 * hidden_size] (batch and seq_len are swapped in your case due to setting batch_first=True) because of using a bidirectional LSTM. The outputs of the two are concatenated in order to have the information of both, which you could easily separate if you wanted to treat them differently.

Pytorch inconsistent size with pad_packed_sequence, seq2seq

1 Answers1