If y is the label and hat y is my prediction, would the following formula for cross-entropy with the number of C possible classes be right:
In the case of a Binary Cross Entropy, can I just remove the sum over C or say C=1?
For calculating the loss over the whole dataset or a mini-batch with size M, I just add 1/M sum over m before sum over c, right?
Thanks!