3

It seems if lightgbm.train is used with an initial score (init_score) it cannot boost this score.

Here is a simple example:

    params = {"learning_rate": 0.1,"metric": "binary_logloss","objective": "binary", 
              "boosting_type": "gbdt","num_iterations": 5, "num_leaves": 2 ** 2,
              "max_depth": 2, "num_threads": 1, "verbose": 0, "min_data_in_leaf": 1}

    x = pd.DataFrame([[1, 0.1, 0.3], [1, 0.1, 0.3], [1, 0.1, 0.3],  
                      [0, 0.9, 0.3], [0, 0.9, 0.3], [0, 0.9, 0.3]], columns=["a", "b", "prob"])
    y = pd.Series([0, 1, 0, 0, 1, 0])

    d_train = lgb.Dataset(x, label=y)
    model = lgb.train(params, d_train)
    y_pred_default = model.predict(x, raw_score=False)

In the case above, no init_score is used. The predictions are correct: y_pred_default = [0.33333333, ... ,0.33333333]

    d_train = lgb.Dataset(x, label=y, init_score=scipy.special.logit(x["prob"]))
    model = lgb.train(params, d_train)
    y_pred_raw = model.predict(x, raw_score=True)

In this part, we assume column "prob" from x to be our initial guess (maybe by some other model). We apply logit and use it as initial score. However, the model cannot improve and the boosting will always return 0: y_pred_raw = [0, 0, 0, 0, 0, 0]

    y_pred_raw_with_init = scipy.special.logit(x["prob"]) + y_pred_raw

    y_pred = scipy.special.expit(y_pred_raw_with_init)

This part above shows the way I suppose is correct to translate the initial scores together with the boosting back to probabilities. Since the boosting is zero y_pred yields [0.3, ..., 0.3] which is our initial probability.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
user1488793
  • 284
  • 2
  • 14

0 Answers0