Target size (torch.Size([12])) must be the same as input size (torch.Size([12, 1000]))

Question

I am using models.vgg16(pretrained=True) model for image classification, where number of classes = 3.

Batch size is 12 trainloader = torch.utils.data.DataLoader(train_data, batch_size=12, shuffle=True) since error says Target size (torch.Size([12])) must be the same as input size (torch.Size([12, 1000]))

I have changed last fc layer parameters and got last FC layer as Linear(in_features=1000, out_features=3, bias=True)

Loss function is BCEWithLogitsLoss()

criterion = nn.BCEWithLogitsLoss()
optimizer = optim.SGD(vgg16.parameters(), lr=0.001, momentum=0.9)

Training code is

        # zero the parameter gradients
        optimizer.zero_grad()
        outputs = vgg16(inputs)               #----> forward pass
        loss = criterion(outputs, labels)   #----> compute loss #error occurs here
        loss.backward()                     #----> backward pass
        optimizer.step()                    #----> weights update

While computing loss, I get this error Target size (torch.Size([12])) must be the same as input size (torch.Size([12, 1000]))

code is available at: code

Please don't include images, instead copy the relevant code directly into your question. — dennlinger, Apr 30 '20 at 08:08

A. Maman · Accepted Answer · 2020-04-30T09:25:56.760

Try to double check how you modified the linear layer. It seems that somehow the model does not forward pass through it.

Your model output have 1000 output size for each sample, while it should have 3. That's the reason you cannot evaluate the loss, since you try to compare 1000 classes to 3. You should have 3 outputs in your last layer, and that should work.

EDIT

From the code you shared here: link, I think there are two problems.

First, you modifed your model this way:

# Load the pretrained model from pytorch
vgg16 = models.vgg16(pretrained=True)

vgg16.classifier[6].in_features = 1000
vgg16.classifier[6].out_features = 3

while what you did here is to add a layer as an attribute to your network, you should also modify the forward() function of your model. Adding the layer as an attribute in the list doesn't apply the layer when forwardpassing the input.

Usually the way to do this properly is to define new class which inherits from the model you want to implement - class myvgg16(models.vgg16) or more generally class myvgg(nn.Module). You can find further explanation in the following link

If it fails, try to unsqueeze(1) your targets size (i.e. the lables variable). This is less likly to be the reason for the error but worth a try.

EDIT

Give another try of converting your target tensor to one hot vectors. And change the tensor type to Float as the BCELoss receives floats.

code is shared at github.com/irfanumar1994/vgg/blob/master/code.ipynb , you can review the model architecture there — Irfan Umar, Apr 30 '20 at 09:12

Pankaj Mishra · Answer 2 · 2020-04-30T09:49:02.537

share the code of your model and it would be easy to debug. The problem is surely in your last fully connected layer. The size mismatch clearly says that you are getting 1000 features each for 12 images(batch size) but then you have 12 features to be compared with.

Clearly fully connected layer has the problem.

Use this and you will solve the problem-

vgg16 = models.vgg16(pretrained=True)

vgg16.classifier[6]= nn.Linear(4096, 3)

if __name__ == "__main__":
    from torchsummary import summary
    model = vgg16
    model = model.cuda()
    print(model)
    summary(model, input_size = (3,120,120))


----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 120, 120]           1,792
              ReLU-2         [-1, 64, 120, 120]               0
            Conv2d-3         [-1, 64, 120, 120]          36,928
              ReLU-4         [-1, 64, 120, 120]               0
         MaxPool2d-5           [-1, 64, 60, 60]               0
            Conv2d-6          [-1, 128, 60, 60]          73,856
              ReLU-7          [-1, 128, 60, 60]               0
            Conv2d-8          [-1, 128, 60, 60]         147,584
              ReLU-9          [-1, 128, 60, 60]               0
        MaxPool2d-10          [-1, 128, 30, 30]               0
           Conv2d-11          [-1, 256, 30, 30]         295,168
             ReLU-12          [-1, 256, 30, 30]               0
           Conv2d-13          [-1, 256, 30, 30]         590,080
             ReLU-14          [-1, 256, 30, 30]               0
           Conv2d-15          [-1, 256, 30, 30]         590,080
             ReLU-16          [-1, 256, 30, 30]               0
        MaxPool2d-17          [-1, 256, 15, 15]               0
           Conv2d-18          [-1, 512, 15, 15]       1,180,160
             ReLU-19          [-1, 512, 15, 15]               0
           Conv2d-20          [-1, 512, 15, 15]       2,359,808
             ReLU-21          [-1, 512, 15, 15]               0
           Conv2d-22          [-1, 512, 15, 15]       2,359,808
             ReLU-23          [-1, 512, 15, 15]               0
        MaxPool2d-24            [-1, 512, 7, 7]               0
           Conv2d-25            [-1, 512, 7, 7]       2,359,808
             ReLU-26            [-1, 512, 7, 7]               0
           Conv2d-27            [-1, 512, 7, 7]       2,359,808
             ReLU-28            [-1, 512, 7, 7]               0
           Conv2d-29            [-1, 512, 7, 7]       2,359,808
             ReLU-30            [-1, 512, 7, 7]               0
        MaxPool2d-31            [-1, 512, 3, 3]               0
AdaptiveAvgPool2d-32            [-1, 512, 7, 7]               0
           Linear-33                 [-1, 4096]     102,764,544
             ReLU-34                 [-1, 4096]               0
          Dropout-35                 [-1, 4096]               0
           Linear-36                 [-1, 4096]      16,781,312
             ReLU-37                 [-1, 4096]               0
          Dropout-38                 [-1, 4096]               0
           Linear-39                    [-1, 3]          12,291
================================================================
Total params: 134,272,835
Trainable params: 134,272,835
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.16
Forward/backward pass size (MB): 62.84
Params size (MB): 512.21
Estimated Total Size (MB): 575.21

code is shared at https://github.com/irfanumar1994/vgg/blob/master/code.ipynb — Irfan Umar, Apr 30 '20 at 09:05
I checked you code. the error is, from your layer 3 the output is 4096, while the input of layer 6 is 1000, which is wrong. It should match the output of layer 3. Just change this and everything looks ok. Just chaange the in_feature of layer 6. that's it — Pankaj Mishra, Apr 30 '20 at 09:13
I added the code above to solve your problem. This is needs to be done. — Pankaj Mishra, Apr 30 '20 at 09:49
Thanks, besides this, one hot encoding and conveting into float worked y_onehot = nn.functional.one_hot(labels, num_classes=3) y_onehot = y_onehot.float() — Irfan Umar, Apr 30 '20 at 10:24

Target size (torch.Size([12])) must be the same as input size (torch.Size([12, 1000]))

2 Answers2