When considering the problem of classifying an input to one of 2 classes, 99% of the examples I saw used a NN with a single output and sigmoid as their activation followed by a binary cross-entropy loss. Another option that I thought of is having the last layer produce 2 outputs and use a categorical cross-entropy with C=2 classes, but I never saw it in any example. Is there any reason for that?
Thanks