I had a binary segmentation task: I had to predict yes or no for each pixel of an image.
Therefore I used a binary cross entropy loss (which is defined in Pytorch and combines a sigmoid and a cross entropy loss) to train the network.
To compute the metrics, since I needed an output of 0 and 1 for each pixel, I used the sigmoid function and then consider everything smaller than 0.5 as 0 and everything bigger than 0.5 as 1.
However I think this approach is not correct and I should have used something like a softmax. Could you explain what approach I should have followed and why?