0

I had a binary segmentation task: I had to predict yes or no for each pixel of an image.

Therefore I used a binary cross entropy loss (which is defined in Pytorch and combines a sigmoid and a cross entropy loss) to train the network.

To compute the metrics, since I needed an output of 0 and 1 for each pixel, I used the sigmoid function and then consider everything smaller than 0.5 as 0 and everything bigger than 0.5 as 1.

However I think this approach is not correct and I should have used something like a softmax. Could you explain what approach I should have followed and why?

emanuele_f
  • 35
  • 1
  • 5
  • Your method sounds good, just select your threshold to be the threshold which gives you the best score on some validation data. 0.5 might be a very bad threshold. – jhso Mar 03 '22 at 23:00
  • I did not think about that! Could you explain why I should add the threshold as an hyper-parameter? Intuitively 0.5 seems to me the most rational one since it is as far from 1 as 0, however I could not justify it mathematically. – emanuele_f Mar 03 '22 at 23:33
  • 0.5 is usually a good option for when your classes are balanced. However, you can plot the distribution of your class 0/1 predicted probabilities and you will often see that the best point (high end of class 0 and low end of class 1) is not at 0.5. When you have class imbalance this will shift towards the dominant class. – jhso Mar 03 '22 at 23:42
  • I used a weighted binary cross entropy though, in this case would it be the same? Sorry for not having correctly described my approach. – emanuele_f Mar 03 '22 at 23:49
  • That should still be fine, just give it a try. If you want to see a general report on how your model is doing you should use the area under the roc curve (look up sklearn's implementation). This will give you a non-thresholded report of accuracy. – jhso Mar 04 '22 at 00:13
  • for binary classification, sigmoid + CE is equivalent to two outputs + softmax + CE. Do rhe math - it's a good exercise – Shai Mar 06 '22 at 20:02

0 Answers0