How to implement custom logloss with identical behavior to binary objective in LightGBM?

Question

I am trying to implement my own loss function for binary classification. To get started, I want to reproduce the exact behavior of the binary objective. In particular, I want that:

The loss of both functions have the same scale
The training and validation slope is similar
predict_proba(X) returns probabilities

None of this is the case for the code below:

import sklearn.datasets
import lightgbm as lgb
import numpy as np

X, y = sklearn.datasets.load_iris(return_X_y=True)
X, y = X[y <= 1], y[y <= 1]

def loglikelihood(labels, preds):
    preds = 1. / (1. + np.exp(-preds))
    grad = preds - labels
    hess = preds * (1. - preds)
    return grad, hess

model = lgb.LGBMClassifier(objective=loglikelihood)  # or "binary"
model.fit(X, y, eval_set=[(X, y)], eval_metric="binary_logloss")
lgb.plot_metric(model.evals_result_)

With objective="binary":

With objective=loglikelihood the slope is not even smooth:

Moreover, sigmoid has to be applied to model.predict_proba(X) to get probabilities for loglikelihood (as I have figured out from https://github.com/Microsoft/LightGBM/issues/2136).

Is it possible to get the same behavior with a custom loss function? Does anybody understand where all these differences come from?

Viktoriya Malyasova · Accepted Answer · 2020-11-21T19:07:42.553

6

Looking at the output of model.predict_proba(X) in each case, we can see that the built-in binary_logloss model returns probabilities, while the custom model returns logits.

The built-in evaluation function takes probabilities as input. To fit the custom objective, we need a custom evaluation function which will take logits as input.

Here is how you could write this. I've changed the sigmoid calculation so that it doesn't overflow if logit is a large negative number.

def loglikelihood(labels, logits):
    #numerically stable sigmoid:
    preds = np.where(logits >= 0,
                 1. / (1. + np.exp(-logits)),
                 np.exp(logits) / (1. + np.exp(logits)))
    grad = preds - labels
    hess = preds * (1. - preds)
    return grad, hess

def my_eval(labels, logits):
    #numerically stable logsigmoid:
    logsigmoid = np.where(logits >= 0, 
                          -np.log(1 + np.exp(-logits)),
                          logits - np.log(1 + np.exp(logits)))
    loss = (-logsigmoid + logits * (1 - labels)).mean()
    return "error", loss, False

    
    model1 = lgb.LGBMClassifier(objective='binary')
    model1.fit(X, y, eval_set=[(X, y)], eval_metric="binary_logloss")
    model2 = lgb.LGBMClassifier(objective=loglikelihood)
    model2.fit(X, y, eval_set=[(X, y)], eval_metric=my_eval)

Now the results are the same.

edited Nov 21 '20 at 19:07

answered Oct 26 '19 at 17:28

Viktoriya Malyasova

1,343
1
11
25

Very cool, thank you. I assume then that there is no way to make predict_proba return probabilities in this case? – Joel Oct 27 '19 at 04:27
Not that I know of. – Viktoriya Malyasova Nov 21 '20 at 19:07
This term `grad = preds - labels` and `hess = preds * (1. - preds)` is first order and second order derivative of what? – Milind Dalvi Dec 17 '20 at 05:08
1

@Milind Dalvi , of the the binary logloss: loss = log(preds) * labels + log(1 - preds) * (1 - labels) , preds = sigmoid(logits). grad = d loss / d logits, hess = d^2 loss / d^2 logits – Viktoriya Malyasova Dec 17 '20 at 08:32
@ViktoriyaMalyasova Interesting, because what I have seen is the first and second order of `- y log(1/(1+e^-x)) - (1-y) log(1-1/(1+e^-x))` works but if `y log(1/(1+e^-x)) + (1-y) log(1-1/(1+e^-x))` does not! – Milind Dalvi Dec 17 '20 at 08:41
Oops, sorry, forgot the minus. – Viktoriya Malyasova Dec 17 '20 at 08:43
1

loss = - log(preds) * labels - log(1 - preds) * (1 - labels) – Viktoriya Malyasova Dec 17 '20 at 08:44
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/226085/discussion-between-milind-dalvi-and-viktoriya-malyasova). – Milind Dalvi Dec 17 '20 at 10:02

How to implement custom logloss with identical behavior to binary objective in LightGBM?

1 Answers1

Linked