Pytorch semantic segmentation loss function

Question

I’m new to segmentation model. I would like to use the deeplabv3_resnet50 model. My image has shape (256, 256, 3) and my label has shape (256, 256). Each pixel in my label has a class value(0-4). And the batch size set in the DataLoader is 32. Therefore, the shape of my input batch is [32, 3, 256, 256] and the shape of corresponding target is [32, 256, 256]. I believe this is correct.

I was trying to use nn.BCEWithLogitsLoss().

Is this the correct loss function for my case? Or should I use CrossEntropy instead?
If this is the right one, the output of my model is [32, 5, 256, 256]. Each image prediction has the shape [5,256, 256], does layer 0 means the unnomarlized probabilities of class 0? In order to make a [32, 256, 256] tensor to match the target to feed into the BCEWithLogitsLoss, do I need to transform the unnomarlized probabilities to classes?
If I should use CrossEntropy, what the size of my output and label should be?

Thank you everyone.

score 5 · Accepted Answer · answered May 10 '21 at 05:27

You are using the wrong loss function.

nn.BCEWithLogitsLoss() stands for Binary Cross-Entropy loss: that is a loss for Binary labels. In your case, you have 5 labels (0..4).
You should be using nn.CrossEntropyLoss: a loss designed for discrete labels, beyond the binary case.

Your models should output a tensor of shape [32, 5, 256, 256]: for each pixel in the 32 images of the batch, it should output a 5-dim vector of logits. The logits are the "raw" scores for each class, to be later on normalize to class probabilities using softmax function.
For numerical stability and computational efficiency, nn.CrossEntropyLoss does not require you to explicitly compute the softmax of the logits, but does it internally for you. As the documentation read:

This criterion combines LogSoftmax and NLLLoss in one single class.

got it. If I would like to calculate IOU or pixel accuracy later, should I make the output to be `[32, 256, 256]` (maybe output.argmax(dim=1)) to match my label? — KKKcat, May 10 '21 at 15:25
@KKKcat argmax over the channel dim should give you the predicted labels — Shai, May 10 '21 at 15:46

score 0 · Answer 2 · answered May 10 '21 at 06:19

Given you are dealing with 5 classes, you should use CrossEntropyLoss. Binary cross-entropy, as the name suggests is a loss function you use when you have a binary segmentation map.

The CrossEntropy function, in PyTorch, expects the output from your model to be of the shape - [batch, num_classes, H, W](pass this directly to your loss function) and the ground truth to be of shape [batch, H, W] where H, W in your case is 256, 256. Also please make sure the ground truth is of type long by calling .long() on the tensor

Pytorch semantic segmentation loss function

2 Answers2

Linked