-1

This is a continuation of this problem. While I ironed out the problems I still get another issue. Would anyone be able to help me in this regards?

Looks like the predicted mask and actual mask have different sizes?

The output code is below:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
/tmp/ipykernel_18/459131192.py in <module>
     25             with torch.set_grad_enabled(phase == "train"):
     26                 y_pred = unet(x)
---> 27                 loss = dsc_loss(y_pred, y_true)
     28                 running_loss += loss.item()
     29 

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

/tmp/ipykernel_18/3969884729.py in forward(self, y_pred, y_true)
      6 
      7     def forward(self, y_pred, y_true):
----> 8         assert y_pred.size() == y_true.size()
      9         y_pred = y_pred[:, 0].contiguous().view(-1)
     10         y_true = y_true[:, 0].contiguous().view(-1)

AssertionError: 

The below is the U-Net model. Please have a look.

unet_network.py:

#Unet
#https://github.com/mateuszbuda/brain-segmentation-pytorch

from collections import OrderedDict

import torch
import torch.nn as nn

class UNet(nn.Module):

    def __init__(self, in_channels=3, out_channels=1, init_features=8):
        super(UNet, self).__init__()

        features = init_features
        self.encoder1 = UNet._block(in_channels, features, name="enc1")
        self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.encoder2 = UNet._block(features, features * 2, name="enc2")
        self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.encoder3 = UNet._block(features * 2, features * 4, name="enc3")
        self.pool3 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.encoder4 = UNet._block(features * 4, features * 8, name="enc4")
        self.pool4 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.bottleneck = UNet._block(features * 8, features * 16, name="bottleneck")

        self.upconv4 = nn.ConvTranspose2d(
            features * 16, features * 8, kernel_size=2, stride=2
        )
        self.decoder4 = UNet._block((features * 8) * 2, features * 8, name="dec4")
        self.upconv3 = nn.ConvTranspose2d(
            features * 8, features * 4, kernel_size=2, stride=2
        )
        self.decoder3 = UNet._block((features * 4) * 2, features * 4, name="dec3")
        self.upconv2 = nn.ConvTranspose2d(
            features * 4, features * 2, kernel_size=2, stride=2
        )
        self.decoder2 = UNet._block((features * 2) * 2, features * 2, name="dec2")
        self.upconv1 = nn.ConvTranspose2d(
            features * 2, features, kernel_size=2, stride=2
        )
        self.decoder1 = UNet._block(features * 2, features, name="dec1")

        self.conv = nn.Conv2d(
            in_channels=features, out_channels=out_channels, kernel_size=1
        )

    def forward(self, x):
        enc1 = self.encoder1(x)
        enc2 = self.encoder2(self.pool1(enc1))
        enc3 = self.encoder3(self.pool2(enc2))
        enc4 = self.encoder4(self.pool3(enc3))

        bottleneck = self.bottleneck(self.pool4(enc4))

        dec4 = self.upconv4(bottleneck)
        dec4 = torch.cat((dec4, enc4), dim=1)
        dec4 = self.decoder4(dec4)
        dec3 = self.upconv3(dec4)
        dec3 = torch.cat((dec3, enc3), dim=1)
        dec3 = self.decoder3(dec3)
        dec2 = self.upconv2(dec3)
        dec2 = torch.cat((dec2, enc2), dim=1)
        dec2 = self.decoder2(dec2)
        dec1 = self.upconv1(dec2)
        dec1 = torch.cat((dec1, enc1), dim=1)
        dec1 = self.decoder1(dec1)
        return torch.sigmoid(self.conv(dec1))

    @staticmethod
    def _block(in_channels, features, name):
        return nn.Sequential(
            OrderedDict(
                [
                    (
                        name + "conv1",
                        nn.Conv2d(
                            in_channels=in_channels,
                            out_channels=features,
                            kernel_size=3,
                            padding=1,
                            bias=False,
                        ),
                    ),
                    (name + "norm1", nn.BatchNorm2d(num_features=features)),
                    (name + "relu1", nn.ReLU(inplace=True)),
                    (
                        name + "conv2",
                        nn.Conv2d(
                            in_channels=features,
                            out_channels=features,
                            kernel_size=3,
                            padding=1,
                            bias=False,
                        ),
                    ),
                    (name + "norm2", nn.BatchNorm2d(num_features=features)),
                    (name + "relu2", nn.ReLU(inplace=True)),
                ]
            )
        )


Thanks & Best Regards

Schroter Michael

Shai
  • 111,146
  • 38
  • 238
  • 371
  • 1
    what is the `shape` of your input batch, your prediction (Unet's output) and the target? – Shai Sep 21 '22 at 06:32
  • Hi, all input images and mask images are 2308x2308. But there is some resizing going on and it resizes it to 1024x1024. Then random crops some images to 512x512. the dataset is split into train and validation. The target is one image and it is 3000x3000. The issue occurs at the training stage also `PadIfNeeded(min_height=1024, min_width=1024, border_mode=cv2.BORDER_CONSTANT, always_apply=True)`. Thanks & Best Regards Schroter Michael – Alain Michael Janith Schroter Sep 21 '22 at 07:15
  • 1. how many channels do you have for input, pred and target? – Shai Sep 21 '22 at 07:22
  • 2. Your image and target have different resolutions (2308 vs 3000) - that is very odd. How do you "match" each _pixel_ to its desired label? How do you ensure that after all augmentations the target is "aligned" with the input batch? – Shai Sep 21 '22 at 07:24
  • Hi, Thanks. All images and mask images have 3 channels. The evaluation of the target is further down the pipline. The issue happens at the training stage. The validation set and the training data are picked from the same input data. Thanks & Best Regards Schroter Michael – Alain Michael Janith Schroter Sep 21 '22 at 07:29
  • The line that causes the issue is the computation of the loss function: `loss = dsc_loss(y_pred, y_true)`. Can add a print just before this line: `print(f'x={x.shape}, pred={pred.shape}, y_true={y_true.shape}')`? You claim that `x` should have a shape of 3x515x512, and so does `pred` and `y_true`. The assertion you have says otherwise. It would be interesting to see what is going on. – Shai Sep 21 '22 at 07:35
  • 1
    Hi, Thanks please find the following: `x=torch.Size([5, 3, 512, 512]), pred=torch.Size([5, 1, 512, 512]), y_true=torch.Size([5, 3, 512, 512])` Thanks & Best Regards Schroter Michael – Alain Michael Janith Schroter Sep 21 '22 at 07:49

1 Answers1

1

Your error stems from the difference in number of channels between the prediction (pred=torch.Size([5, 1, 512, 512])) and the target (y_true=torch.Size([5, 3, 512, 512])).

For a target with 3 channels, you need your pred to have three channels as well. That is, you need to configure your UNet to have out_channels=3 instead of the default of 1.

Shai
  • 111,146
  • 38
  • 238
  • 371