I am trying to use LightGBM as a multi-output predictor as suggested here. I am trying to forecast values for thirty consecutive days. I have a panel dataset so I can't use the traditional time series approaches.
I have a very large dataset so it takes too long to train the model without early stopping. So, I am trying to pass the eval_set
, early_stopping_rounds
and eval_metric
parameters like below:
from lightgbm import LGBMRegressor
from sklearn.multioutput import MultiOutputRegressor
hyper_params = {
'task': 'train',
'boosting_type': 'gbdt',
'objective': 'regression',
'metric': ['l1','l2'],
'learning_rate': 0.01,
'feature_fraction': 0.9,
'bagging_fraction': 0.7,
'bagging_freq': 10,
'verbose': 0,
"max_depth": 8,
"num_leaves": 128,
"max_bin": 512,
"num_iterations": 10000
}
lgbc_fit_params = {
'early_stopping_rounds' : 300,
'eval_set': (X_test, y_test_array),
'eval_metric':'l1'
}
gbm = lgb.LGBMRegressor(**hyper_params)
regr_multiglb = MultiOutputRegressor(gbm)
regr_multiglb.fit(X_train, y_train_array, **lgbc_fit_params)
Here, both y_train_array
and y_test_array
are 2-d numpy arrays with shapes (1953395, 30)
and (331003, 30)
, respectively.
When I run this code, I get the following error:
When I run the fit function without **lgbc_fit_parameters
, the code runs without errors.
Any suggestions on how to pass the base estimator's (LightGBM) fit parameters into the wrapper?