I was working on segmentation using unet, its a multiclass segmentation problem with 21 classes.
Thus Ideally we go with softmax as activation in the last layer, which contains 21 kernels so that output depth will be 21 which will match the number of classes.
But my question is if we use 'Softmax' as activation in this layer how will it work? I mean since softmax will be applied to each feature map and by the nature of 'softmax' it will give probabilities that sum to 1. But we need 1's in all places where the corresponding class is present in the feature map.
Or is the 'softmax' applied depth wise like taking all 21 class pixels in depth and applied on top of it?
Hope I have explained the problem properly
I have tried with sigmoid as activation, and the result is not good.