optuna.integration.lightGBM custom optimization metric

Question

I am trying to optimize a lightGBM model using optuna.

Reading the docs I noticed that there are two approaches that can be used, as mentioned here: LightGBM Tuner: New Optuna Integration for Hyperparameter Optimization.

The first approach uses the "standard" way of optimizing with optuna (objective function + trials), the second one wrappes everything together with the .train() function. The first one basically tries combination of hyper-parameters values, while the second one optimizes following a step-wise approach on the hyperparameters.

The two approaches are showed in the following code examples in the optuna github repository:

Both codes perform the exact same optimization on the same parameters (optimized parameters by the second approach are decribed here), but in different ways (combinatorial vs step-wise).

My questions is:

Is it possible in the second approach to specify a custom evaluation metric? In the first one I can easily change the accuracy used inside the github examples with any custom metric.
As an example I could write:

 import lightgbm as lgb
 import numpy as np
 import sklearn.datasets
 import sklearn.metrics
 from sklearn.model_selection import train_test_split

 import optuna

 def my_eval_metric(valid_y, pred_labels):
     # my custom metric
     ..........
     ..........

     return my_metric

 def objective(trial):
     data, target = sklearn.datasets.load_breast_cancer(return_X_y=True)
     train_x, valid_x, train_y, valid_y = train_test_split(data, target, test_size=0.25)
     dtrain = lgb.Dataset(train_x, label=train_y)

     param = {
         "objective": "binary",
         "metric": "binary_logloss",
         "verbosity": -1,
         "boosting_type": "gbdt",
         "lambda_l1": trial.suggest_float("lambda_l1", 1e-8, 10.0, log=True),
         "lambda_l2": trial.suggest_float("lambda_l2", 1e-8, 10.0, log=True),
         "num_leaves": trial.suggest_int("num_leaves", 2, 256),
         "feature_fraction": trial.suggest_float("feature_fraction", 0.4, 1.0),
         "bagging_fraction": trial.suggest_float("bagging_fraction", 0.4, 1.0),
         "bagging_freq": trial.suggest_int("bagging_freq", 1, 7),
         "min_child_samples": trial.suggest_int("min_child_samples", 5, 100),
     }

     gbm = lgb.train(param, dtrain)
     preds = gbm.predict(valid_x)
     pred_labels = np.rint(preds)
     my_eval_metric_value = my_eval_metric(valid_y, pred_labels)
     return custom_metric_value 


 if __name__ == "__main__":
     study = optuna.create_study(direction="maximize")
     study.optimize(objective, n_trials=100)

     print("Number of finished trials: {}".format(len(study.trials)))

     print("Best trial:")
     trial = study.best_trial

     print("  Value: {}".format(trial.value))

     print("  Params: ")
     for key, value in trial.params.items():
         print("    {}: {}".format(key, value))

This code will return the parameters of the lightGBM model that maximizes my custom metric. However in the second approach I haven't been able to specify my own custom metric.

UPDATE: I managed to define my own custom metric and its usage inside the second approach. A minimal reproducible code is the following (just pass the data using train_test_split by scikit):

from sklearn.metrics import average_precision_score
import optuna.integration.lightgbm as lgb_sequential

def tune_lightGBM_sequential(X_train, X_val, y_train, y_val):
    
    def calculate_ctr(gt):
        positive = len([x for x in gt if x == 1])
        ctr = positive/float(len(gt))
        return ctr

    def compute_rce(preds, train_data):
        gt = train_data.get_label()
        cross_entropy = log_loss(gt, preds)
        data_ctr = calculate_ctr(gt)
        strawman_cross_entropy = log_loss(gt, [data_ctr for _ in range(len(gt))])
        rce = (1.0 - cross_entropy/strawman_cross_entropy)*100.0
        return ('rce', rce, True)

    def compute_avg_precision(preds, train_data):
        gt = train_data.get_label()
        avg_precision= average_precision_score(gt, preds)
        return('avg_precision', avg_precision, True)
    
    params = {
        "objective": "binary",
        "metric": 'custom',
        "boosting_type": "gbdt",
        "verbose" : 2
    }
    
    dtrain = lgb_sequential.Dataset(X_train, label=y_train)
    dval = lgb_sequential.Dataset(X_val, label=y_val)
    
    print('Starting training lightGBM sequential')
    model = lgb_sequential.train(
        params, dtrain, valid_sets=[dtrain, dval], verbose_eval=True,num_boost_round =2, early_stopping_rounds=100, feval = [compute_rce, compute_avg_precision]
    )
    
    return model.params

However Optuna doesn't seem to be able to select the best trial based on my custom metrics, in fact, I get the following error:

[W 2021-05-16 15:56:48,759] Trial 0 failed because of the following error: KeyError('custom') Traceback (most recent call last): File "C:\Users\Mattia\anaconda3\envs\rec_sys_challenge\lib\site-packages\optuna_optimize.py", line 217, in _run_trial value_or_values = func(trial) File "C:\Users\Mattia\anaconda3\envs\rec_sys_challenge\lib\site-packages\optuna\integration_lightgbm_tuner\optimize.py", line 251, in call val_score = self._get_booster_best_score(booster) File "C:\Users\Mattia\anaconda3\envs\rec_sys_challenge\lib\site-packages\optuna\integration_lightgbm_tuner\optimize.py", line 118, in _get_booster_best_score val_score = booster.best_score[valid_name][metric] KeyError: 'custom'

It seems a issue with the library (you can find more here: GitHub Issue), I tried many proposed solutions, but none of them worked.

Any help?

How about setting the value of metric to "None" as in from `"metric": 'custom',` to `"metric": "None",` — ferdy, Nov 18 '21 at 02:54

optuna.integration.lightGBM custom optimization metric

0 Answers0