Using a target size (torch.Size([64, 1])) that is different to the input size (torch.Size([64, 20, 64])) incorrect results due to broadcasting

Question

First of all, I am playing around with Python/PyTorch/LSTM for no good reason. Just curious is all. I "think" I have all of my tensors configured correctly. My window size is 20. My batch size is 64.

These are their shapes when I feed them to Dataset method.

    stock_train_tensor shape is:  torch.Size([4688, 20, 1])
    stock_validate_tensor shape is:  torch.Size([1172, 20, 1])
    stock_train_target_tensor shape is:  torch.Size([4688, 1])
    stock_validate_target_tensor shape is:  torch.Size([1172, 1])

My Dataset is:

    class StockPriceDataSet(Dataset):
        def __init__(self, data, targets):
            self.data = data
            self.targets = targets

        def __getitem__(self, index):
            x = self.data[index]
            y = self.targets[index]
            return x, y
     
        def __len__(self):
            return len(self.data)

Then I do this:

    training_dataset = StockPriceDataSet(stock_train_tensor, stock_train_target_tensor)
    validation_dataset= StockPriceDataSet(stock_validate_tensor, stock_validate_target_tensor)

    train_dataloader = DataLoader(training_dataset, batch_size=64, shuffle=False)
    validate_dataloader = DataLoader(validation_dataset, batch_size=64, shuffle=False)

My LSTM model is configured as such.

    lstm = nn.LSTM(input_size=1,  hidden_size=64, num_layers=2, batch_first=True)
    criterion = nn.MSELoss()
    optimizer = optim.SGD(lstm.parameters(),lr=0.01\])

And then when I try train the model:

    for epoch in range(config\["training"\]\["num_epoch"\]):
        for i, (x, y) in enumerate(train_dataloader):
            output, \_= lstm(x)
            y = y.float()
            time.sleep(6)
            #compute the loss and backpropogate
            loss = criterion(output, y) <===== this causes the warning

I get this warning:

loss.py:536: UserWarning: Using a target size (torch.Size([64, 1])) that is different to the input size (torch.Size([64, 20, 64])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.

I put some debug print statements and this is what I see right before the warning message.

output is shape:  torch.Size([64, 20, 64])
y is shape:  torch.Size([64, 1])

I tried repeating the target value across dim=1 so that my target tensor was also 64,20. That resulted in the same message.

loss.py:536: UserWarning: Using a target size (torch.Size([64, 20])) that is different to the input size (torch.Size([64, 20, 64])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.

And, I tried to add a third dimension and got this:

loss.py:536: UserWarning: Using a target size (torch.Size([64, 20, 1])) that is different to the input size (torch.Size([64, 20, 64])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.

Any clue on how to debug this would be greatly appreciated.

Thanks.

George

I just figured out that the torch.nn.LSTM module uses hidden_size (hidden_size * 1 or 2 if bidirectional) to set the 3rd dimension of the output tensor. So in my case, it is always reformatting my input to 64, 20, 64. I just found a bit in the docs that say "unless proj_size > 0". I'm trying that now. At least I've changed the warning message. rnn.py:812: UserWarning: LSTM with projections is not supported with oneDNN. Using default implementation. — GeorgeIV, Apr 09 '23 at 07:10

score 0 · Answer 1 · answered Apr 09 '23 at 12:36

You are trying to compare the outputs of one LSTM layer with labels without formatting it into a correct shape. You can either add a fully-connected layer to obtain correct shaped output from the pooled/flattened output of LSTM or only use the last output of LSTM layer for prediction. You can grasp the meanings of the outputs of LSTM here.

Use Last Output of LSTM in Training

for epoch in range(config\["training"\]\["num_epoch"\]):
    for i, (x, y) in enumerate(train_dataloader):
        output, \_= lstm(x)
        y = y.float()
        time.sleep(6)
        #compute the loss and backpropogate
        loss = criterion(output[:, :, -1], y) # using the last output from LSTM

Using LSTM with Fully Connected Layers with Flattening/Pooling

class LSTM_Model_w_Flat(nn.Module):
    """ LSTM model that uses flattened outputs to feed into fully-connected layers.
    """
    def __init__(self):
        super().__init__()
        self.use_pooling = use_pooling
        self.lstm = nn.LSTM(input_size=1,  hidden_size=64, num_layers=2, batch_first=True) # same as LSTM being used
        self.flat = nn.Flatten()
        # fully-connected layer
        self.dense = nn.Linear(64*20, 1) # the output from LSTM [64, 20, 64] becomes [64, 20*64] after flattening

    def forward(self, x):
        x, *_ = self.lstm(x)
        x = self.flat(x)
        x = self.dense(x)

class LSTM_Model_w_Pooling(nn.Module):
    """ LSTM model that uses average-pooled outputs to feed into fully-connected layers.
    """
    def __init__(self):
        super().__init__()
        self.use_pooling = use_pooling
        self.lstm = nn.LSTM(input_size=1,  hidden_size=64, num_layers=2, batch_first=True) # same as LSTM being used
        # fully-connected layer
        self.dense = nn.Linear(20, 1) # the output from LSTM [64, 20, 64] becomes [64, 20] after pooling

    def forward(self, x):
        x, *_ = self.lstm(x)
        x = torch.mean(x, dim=-1)
        x = self.dense(x)

Then, your training loop will be

model = LSTM_Model_w_Flat() # LSTM_Model_w_Pooling() for pooling usage
for epoch in range(config\["training"\]\["num_epoch"\]):
    for i, (x, y) in enumerate(train_dataloader):
        output = model(x)
        y = y.float()
        time.sleep(6)
        #compute the loss and backpropogate
        loss = criterion(output, y)

Using a target size (torch.Size([64, 1])) that is different to the input size (torch.Size([64, 20, 64])) incorrect results due to broadcasting

1 Answers1