Custom eval metric using early stopping in LGBM (Sklearn API) and Optuna

Question

Questions:

First question is probably extremely stupid but I will ask anyway: Is the pruning and the early stopping the same in this example below? Or is it two separate separate options controlling two separate processes?
I got an imbalanced target, so how can I use a custom evaluation metric here instead of 'binary_logloss' such as e.g. balanced accuracy?
When I get the optimal parameters, the 'n_estimators' will still equal 999999. Using an "infinite" number of estimators and prune using early stopping is recommended for imbalanced target so that's why it's so high. How do fit the final model with the optimal n_estimators post pruning?

Thank you very much for helping me out with this I am quite frustrated.

def objective(trial, X, y):
    param_grid = {
        # "device_type": trial.suggest_categorical("device_type", ['gpu']),
        "n_estimators": trial.suggest_categorical("n_estimators", [999999]),
        "learning_rate": trial.suggest_float("learning_rate", 0.01, 0.3),
        "num_leaves": trial.suggest_int("num_leaves", 20, 3000, step=20),
        "max_depth": trial.suggest_int("max_depth", 3, 12),
        "min_data_in_leaf": trial.suggest_int("min_data_in_leaf", 200, 10000, step=100),
        "lambda_l1": trial.suggest_int("lambda_l1", 0, 100, step=5),
        "lambda_l2": trial.suggest_int("lambda_l2", 0, 100, step=5),
        "min_gain_to_split": trial.suggest_float("min_gain_to_split", 0, 15),
        "bagging_fraction": trial.suggest_float(
            "bagging_fraction", 0.2, 0.95, step=0.1
        ),
        "bagging_freq": trial.suggest_categorical("bagging_freq", [1]),
        "feature_fraction": trial.suggest_float(
            "feature_fraction", 0.2, 0.95, step=0.1
        ),
    }

    cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=1121218)

    cv_scores = np.empty(5)
    for idx, (train_idx, test_idx) in enumerate(cv.split(X, y)):
        X_train, X_test = X.iloc[train_idx], X.iloc[test_idx]
        y_train, y_test = y.iloc[train_idx], y.iloc[test_idx]

        model = LGBMClassifier(
            objective="binary",
            **param_grid,
            n_jobs=-1,
            scale_pos_weight=len(y_train) / y_train.sum()
        )
        
        model.fit( 
            X_train,
            y_train,
            eval_set=[(X_test, y_test)],
            eval_metric="binary_logloss", # replace this with e.g. balanced accuracy or f1
            callbacks=[
                LightGBMPruningCallback(trial, "binary_logloss"), # replace this with e.g. balanced accuracy or f1
                early_stopping(100, verbose=False)
            ], 
        )
        preds = model.predict(X_test)#.argmax(axis=1)
        cv_scores[idx] = balanced_accuracy_score(y_test, preds)
    
    loss = 1 - np.nanmedian(cv_scores)
    return loss

Run:

study = optuna.create_study(direction="minimize", study_name="LGBM Classifier")
func = lambda trial: objective(trial, X_train, y_train)
study.optimize(func, n_trials=1)

Fit the final problem. But here I don't want to fit with n_estimators=999999, but with the optimal number of n_estimators:

model = LGBMClassifier(
    objective="binary",
    **study.best_params,
    n_jobs=-1,
    scale_pos_weight=len(y) / y.sum()
)

score 3 · Answer 1 · answered Nov 03 '22 at 16:11

So after a day of experimenting I can answer my own questions:

The LGBM pruning defined by LightGBMPruningCallback(trial, "your_metric") is NOT the referring to the early stopping procedure. The LGBM pruning essentially skips evaluating all cv-folds within a given trial (i.e. for a given set of hyper parameters) if the metric is very unsatisfactory (e.g. low balanced accuracy).
This was very annoying, the solution is not well documented, but it is t set metric='custom' in LGBMClassifier then define the metric in a function and set eval_metric=your_function, see the code below.
It may be a way to retrieve n_estimators for the optimal trial (best params), however, I solved it by fitting the final model with early stopping, see the code below:

CODE

Define a custom metric

def custom_metric(y_true, y_hat):  
    higher_is_better = True
    y_hat_label = np.round(y_hat)
    balanced_accuracy = balanced_accuracy_score(y_true, y_hat_label)
    return 'balanced_accuracy', balanced_accuracy, higher_is_better

Define the objective function (important changes wrt to my question above are commented):

def objective(trial, X, y):
    param_grid = {
        "n_estimators": trial.suggest_categorical("n_estimators", [999999]),
        "learning_rate": trial.suggest_float("learning_rate", 0.01, 0.3),
        "num_leaves": trial.suggest_int("num_leaves", 20, 3000, step=20),
        "max_depth": trial.suggest_int("max_depth", 3, 12),
        "min_data_in_leaf": trial.suggest_int("min_data_in_leaf", 200, 10000, step=100),
        "lambda_l1": trial.suggest_int("lambda_l1", 0, 100, step=5),
        "lambda_l2": trial.suggest_int("lambda_l2", 0, 100, step=5),
        "min_gain_to_split": trial.suggest_float("min_gain_to_split", 0, 15),
        "bagging_fraction": trial.suggest_float(
            "bagging_fraction", 0.2, 0.95, step=0.1
        ),
        "bagging_freq": trial.suggest_categorical("bagging_freq", [1]),
        "feature_fraction": trial.suggest_float(
            "feature_fraction", 0.2, 0.95, step=0.1
        ),
    }

    cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=1121218)

    cv_scores = np.empty(5)
    for idx, (train_idx, test_idx) in enumerate(cv.split(X, y)):
        X_train, X_test = X.iloc[train_idx], X.iloc[test_idx]
        y_train, y_test = y.iloc[train_idx], y.iloc[test_idx]

        model = LGBMClassifier(
            metric='custom', #THIS HAS CHANGED (REF QUESTION 2)!
            objective="binary",
            **param_grid,
            n_jobs=-1,
            scale_pos_weight=len(y_train) / y_train.sum()
        )

        model.fit( 
            X_train,
            y_train,
            eval_set=[(X_test, y_test)],
            eval_metric=[custom_metric], # THIS HAS CHANGED (REF QUESTION 2)!
            callbacks=[
                LightGBMPruningCallback(trial, "balanced_accuracy"),  # THIS HAS CHANGED (REF QUESTION 2)!
                early_stopping(100, verbose=True),
            ],  # Add a pruning callback
        )
        preds = model.predict(X_test)#.argmax(axis=1)
        cv_scores[idx] = balanced_accuracy_score(y_test, preds)
    
    score = np.nanmedian(cv_scores)
    return score

The optimization:

study = optuna.create_study(direction="maximize", study_name="LGBM Classifier")
func = lambda trial: objective(trial, X_train, y_train)
study.optimize(func, n_trials=10)

And finally fitting the final model (i.e. answer to question 3). I solved this by using early stopping for the final model:

model = LGBMClassifier(
    objective="binary",
    metric='custom', # THIS HAS CHANGED (REF QUESTION 2)!
    **study.best_params,
    n_jobs=-1,
    scale_pos_weight=len(y) / y.sum()
)

model.fit(
    X_train,
    y_train,
    eval_set=[(X_test, y_test)],
    eval_metric=custom_metric,
    early_stopping_rounds=100,
    callbacks=[
        early_stopping(100, verbose=True),
    ], 
)

This algorithm will apply early stopping for each LGBM model applied to each fold within each trial (i.e. combination of hyper parameters).

It will inn addition prune (i.e stop) certain trials that give unsatisfactory score metrics before it has applied the algorithm to all five folds. Some trials will be stopped very early.

It then continues to fit the final model - after the search is done. In the final fit the model use early stopping (note that I use a different evaluation set in the final fit).

And that's it, have a great day:)

Custom eval metric using early stopping in LGBM (Sklearn API) and Optuna

1 Answers1