0

Computer vision and deep learning literature usually say one should use binary_crossentropy for a binary (two-class) problem and categorical_crossentropy for more than two classes. Now I am wondering: is there any reason to not use the latter for a two-class problem as well?

Matthias
  • 9,817
  • 14
  • 66
  • 125
  • There are already some good threads on this topic: https://stats.stackexchange.com/questions/260505/machine-learning-should-i-use-a-categorical-cross-entropy-or-binary-cross-entro https://stats.stackexchange.com/questions/357541/what-is-the-difference-between-binary-cross-entropy-and-categorical-cross-entrop – techytushar Dec 06 '19 at 15:44

1 Answers1

1
  • categorical_crossentropy:
    • accepts only one correct class per sample
    • will take "only" the true neuron and make the crossentropy calculation with that neuron
  • binary_crossentropy:
    • accepts many correct classes per sample
    • will do the crossentropy calculation for "all neurons", considering that each neuron can be two classes, 0 and 1.

A 2-class problem can be modeled as:

  • 2-neuron output with only one correct class: softmax + categorical_crossentropy
  • 1-neuron output, one class is 0, the other is 1: sigmoid + binary_crossentropy

Explanation

enter image description here

Notice how in categorical crossentropy (the first equation), the term y_true is only 1 for the true neuron, making all other neurons equal to zero.

The equation can be reduced to simply: ln(y_pred[correct_label]).

Now notice how binary crossentropy (the second equation in the picture) has two terms, one for considering 1 as the correct class, another for considering 0 as the correct class.

Daniel Möller
  • 84,878
  • 18
  • 192
  • 214