What's the cost function in multi-class classification?

Question

I am in trouble to know what is the loss function of a neural network. For a binary classification problem, is it mean squared error, as described in the following video: https://www.youtube.com/watch?v=5u0jaA3qAGk&t=59s or is it cross entropy as defined here http://work.caltech.edu/slides/slides09.pdf and why ?

Moreover, in case of multi classification, I think there is something like softmax but I don't really know how it works. Could someone explain me properly ?

Thanks !

score 3 · Accepted Answer · answered Sep 08 '17 at 13:08

In theory you can build neural networks using any loss function. You can used mean squared error or cross entropy loss functions. It boils down to what is going to be the most effective. By most effective, I mean: what is going to allow you to learn the parameters more quickly and / or more accurately.

In practicality most neural networks tend to use cross entropy. Many beginner's classes and tutorials on neural network will show you mean squared error, as it is probably more intuitive and easier to understand at first.

This article explains it in more detail, but let me quote:

When should we use the cross-entropy instead of the quadratic cost? In fact, the cross-entropy is nearly always the better choice, provided the output neurons are sigmoid neurons

Regarding the softmax function. As you probably know, every neuron will have an activation function. Very often that function is a sigmoid function. The softmax function is another type of activation functions, usually used in the last layer of your neural network. The softmax function has a unique property. The output will be a value from 0 to 1 and the sum of all the outputs for each neuron in the layer will equal to 1. Effectively indicating a probability. And this makes it very appropriate for multi-class classification, as it will give you the probability for each class, and you could pick the class with the highest probability.

Thanks, it helps. Is it right that mean-squared error is not so advised because it is as we give a real importance to the number of the class (i.e. class 3 is "more important" than class 1). Also, do you have any article explainly properly softmax? — MysteryGuy, Sep 08 '17 at 13:12
Not really, the reason cross-entropy is preferred over mean squared is mainly rooted in math and derivatives. The derivatives of the cost function are used in the back propagation algorithm. And here is an article about softmax. http://dataaspirant.com/2017/03/07/difference-between-softmax-function-and-sigmoid-function/ — Olivier De Meulder, Sep 08 '17 at 13:16
I don't really get the point about the reason cross-entropy is preferred over mean squared... Why should it be linked with derivatives? There are both derivable... Could you develop a bit more,please ? — MysteryGuy, Sep 08 '17 at 13:19
[I haven't red the articles entirely now, but at glance, they look pretty interesting] — MysteryGuy, Sep 08 '17 at 13:20
@MysteryGuy maybe a bit late, but the thing is that the derivatives of those functions are not the same, and cross entropy makes training converge faster — Javi, Feb 01 '19 at 14:11

What's the cost function in multi-class classification?

1 Answers1