I am calculating Dice Score for binary segmentation case, in some of my ground truths there is no label, i.e. it has all zeros. So when I use a different batch size for inference I am getting different results, especially worst for batch size=1, I came to know the reason as shown in the following figure: It averages all the cases even when the TP=0: [results descriptions][1] [1]: https://i.stack.imgur.com/mHj3o.png
What is the logical solution, and how do the experts deal with this problem, one possible solution can be: Calculate Dice Score only for those predictions for which ground truth>0 Is it the right approach to publish the results? I didn't see any paper mentioning this issue: Any link to the published work which dealt with this problem will be appreciated. Thank You