Answer
This error is caused by the fact that you used early stopping during grid search, but decided not to use early stopping when fitting the best model over the full dataset.
Some keyword arguments you pass into LGBMClassifier
are added to the params
in the model object produced by training, including early_stopping_rounds
.
To disable early stopping, you can use update_params()
.
best_model = grid_search.best_estimator_
# ---------------- my added code -----------------------#
# inspect current parameters
params = best_model.get_params()
print(params)
# remove early_stopping_rounds
params["early_stopping_rounds"] = None
best_model.set_params(**params)
# ------------------------------------------------------#
best_model.fit(X_train, y_train)
More Details
I made some assumptions to turn your question into a minimal reproducible example. In the future, I recommend doing that when you ask questions here. It will help you get better, faster help.
I installed lightgbm
3.1.0 with pip install lightgbm==3.1.0
. I'm using Python 3.8.3 on Mac.
Things I changed from your example to make it an easier-to-use reproduction
- removed commented code
- cut the number of iterations to
[10, 100]
and num_leaves
to [8, 10]
so training would run much faster
- added imports
- added a specific dataset and code to produce it repeatably
reproducible example
from lightgbm import LGBMClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import GridSearchCV, train_test_split
param_grid = {
'n_estimators' : [10, 100],
'boosting_type': ['gbdt'],
'num_leaves': [8, 10],
'subsample': [0.8, 0.95],
'is_unbalance': [True, False],
'min_split_gain' :[0.01, 0.02, 0.05]
}
lgb = LGBMClassifier(
random_state=42,
early_stopping_rounds = 10,
eval_metric = 'auc',
verbose_eval=20
)
grid_search = GridSearchCV(
lgb,
param_grid= param_grid,
scoring='roc_auc',
cv=5,
n_jobs=-1,
verbose=1
)
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.1,
random_state=42
)
grid_search.fit(
X_train,
y_train,
eval_set = (X_test, y_test)
)
best_model = grid_search.best_estimator_
# ---------------- my added code -----------------------#
# inspect current parameters
params = best_model.get_params()
print(params)
# remove early_stopping_rounds
params["early_stopping_rounds"] = None
best_model.set_params(**params)
# ------------------------------------------------------#
best_model.fit(X_train, y_train)