0

i made a neural network with keras in python and cannot really understand what the loss function means.

So here first some general information: i worked with the poker hand dataset with classes 0-9, which i wrote as vectors with the OneHotEncoding. I used the softmax activation in the last layer, so my output tells me for each of the 10 entries in a vector the probability if the sample belongs to a certain class. For example: my real input it (0,1,0,0,0,0,0,0,0,0), which means class 1 (from 0-9 means from no card to royal flush), and class 1 means one pair (if you know poker). With the neural net, it get at the and Outputs like (0.4, 0.2, 0.1, 0.1, 0.2, 0,0,0,0,0), which means that my sample belongs with 40 percent to class 0, with 20 percent to class 1 and so on!

Allright! i used also the binary cross_entropy as loss, the accuracy-metrics and the RMSprop-Optimizer. When i use mode.evaluate() from keras, i got something like 0.16 for the loss and i do not know how to interpret this. Does this mean, that in average, my predictions deviate 0.16 from the true? so if my prediction for class 0 is 0.5, it also could be 0.66 or 0.34? Or how can i interpret it?

Please send help!

Eli Hektor
  • 79
  • 9
  • 1
    Why do you use binary cross entropy when you have a multi-class problem? – Code Pope May 04 '20 at 09:38
  • https://developers.google.com/machine-learning/crash-course/descending-into-ml/training-and-loss might be a good start to read up on. Once you understand the loss, you can then look into what loss is being used in your model. Should be MSE as well. – Jason Chia May 04 '20 at 09:38
  • thanks Jason, i think i understand the loss, my problem is more the computation with the keras model.evaluate()! i use binary crossentropy because i use first OneHotEncoding – Eli Hektor May 04 '20 at 09:58

1 Answers1

1

First at all, according to your problem definition you have a multi-class problem. Thus, you should use categorical_crossentropy. Binary cross_entropy is for two-class problems or for multi-label classification.
But generally the value of the loss function has a relative impact value. First at all, you have to understand what the cross_entropy is meaning. The formula is:
enter image description here
where c is the correct classification of observation o and
y is the binary indicator (0 or 1) if class label c is the correct classification for observation o and p is the predicted probability that o is of class c.
For binary cross entropy, M is equal to 2. For categorical cross entropy, M>2. Therefore, the cross entropy decreases if the predicted probability converges to the actual label:
enter image description here

Now let's take your example, where you have 10 classes and your real input is: (0,1,0,0,0,0,0,0,0,0). If you have a loss of 0.16, it means that
enter image description here which means that your model has assigned 0.85 to the correct label.
Therefore, the loss function gives you the log of the correct classification probability. As in keras the loss is computed on whole batches, it is the average of the log of the correct classification probability of the whole data in the specific batch. If you use the evaluate function, then it is the average of the log of the correct classification probability of the whole data you are evaluating.

Community
  • 1
  • 1
Code Pope
  • 5,075
  • 8
  • 26
  • 68
  • Thank you! This is exactely what i wanted to see/hear :) I understand now! But one more question. If i used OneHotEncoding to represent my 10 classes as vectors or as an array, this means as (1,0,0,...) for class 0, (0,1,0,0,...) for class 1 and so on- why shouldn't i use binary cross entropy? – Eli Hektor May 04 '20 at 11:21