I'm training a CNN for multi-labels but it has around 160 labels, so when using normal CNN Architecture with sigmoid for the output layer and binary_crossentropy for the loss the network is still biased for the zeros, because the loss function takes all the outputs and normalize them, so the least loss will happen when all the output is zeros even the right labels because it is normalized. so does anyone have a solution?
Asked
Active
Viewed 193 times
1 Answers
0
Use categorical cross entropy instead of binary cross entropy and use softmax instead of sigmoid.

bsquare
- 943
- 5
- 10