17

Making sure I am getting this right:

If we use sklearn.metrics.log_loss standalone, i.e. log_loss(y_true,y_pred), it generates a positive score -- the smaller the score, the better the performance.

However, if we use 'neg_log_loss' as a scoring scheme as in 'cross_val_score", the score is negative -- the bigger the score, the better the performance.

And this is due to the scoring scheme being built to be consistent with other scoring schemes - since, generally, the higher the better, we negate usual log_loss to be consistent with the trend. And it is done so solely for that purpose. Is this understanding correct?

[Background: got positive scores for metric.log_loss, and negative scores for 'neg_los_loss', and both refer to the same documentation page.]

jtlz2
  • 7,700
  • 9
  • 64
  • 114
Max
  • 385
  • 4
  • 11
  • 1
    I was wondering the same thing – O.rka Jul 07 '17 at 22:13
  • When you say "the bigger the score, the better the performance", do you mean (a) the bigger the absolute value of the score (i.e. the more negative the score), or (b) the more positive the score? – jtlz2 Jan 09 '23 at 09:49

1 Answers1

9

The sklearn.metrics.log_loss is an implementation of the error metric as typically defined, and which is as most error metrics a positive number. In this case, it is a metric which is generally minimized (e.g. as mean squared error for regression), in contrast to metrics such as accuracy which is maximized.

The neg_log_loss is hence a technicality to create a utility value, which allows optimizing functions and classes of sklearn to maximize this utility without having to change the function's behavior for each metric (such include for instance named cross_val_score, GridSearchCV, RandomizedSearchCV, and others).

petezurich
  • 9,280
  • 9
  • 43
  • 57
Marcus V.
  • 6,323
  • 1
  • 18
  • 33
  • 1
    so, `neg_log_loss` is simply equal to `- log_loss`? – CodeUnsolved Oct 08 '18 at 03:46
  • 3
    Yes, ```neg_log_loss``` is simply equal to ```- log_loss```, as it says e.g. [here](http://scikit-learn.org/stable/modules/model_evaluation.html#common-cases-predefined-values): "Thus metrics which measure the distance between the model and the data [...] return the negated value of the metric." – Marcus V. Oct 08 '18 at 08:38