24

I want to do a cross validation for LightGBM model with lgb.Dataset and use early_stopping_rounds. The following approach works without a problem with XGBoost's xgboost.cv. I prefer not to use Scikit Learn's approach with GridSearchCV, because it doesn't support early stopping or lgb.Dataset.

import lightgbm as lgb
from sklearn.metrics import mean_absolute_error
dftrainLGB = lgb.Dataset(data = dftrain, label = ytrain, feature_name = list(dftrain))

params = {'objective': 'regression'}
    
cv_results = lgb.cv(
        params,
        dftrainLGB,
        num_boost_round=100,
        nfold=3,
        metrics='mae',
        early_stopping_rounds=10
        )

The task is to do regression, but the following code throws an error:

Supported target types are: ('binary', 'multiclass'). Got 'continuous' instead.

Does LightGBM support regression, or did I supply wrong parameters?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Marius
  • 409
  • 1
  • 5
  • 9

1 Answers1

44

By default, the stratify parameter in the lightgbm.cv is True. According to the documentation:

stratified (bool, optional (default=True)) – Whether to perform stratified sampling.

But stratify works only with classification problems. So to work with regression, you need to make it False.

cv_results = lgb.cv(
        params,
        dftrainLGB,
        num_boost_round=100,
        nfold=3,
        metrics='mae',
        early_stopping_rounds=10,

        # This is what I added
        stratified=False
        )

Now its working.

Vivek Kumar
  • 35,217
  • 8
  • 109
  • 132
  • 3
    Interesting. Looks like it was swapped to True [here](https://github.com/Microsoft/LightGBM/pull/734). One to keep in mind for the future! OP - it also looks like `shuffle=True` is the default, so be careful comparing to scikit-learn, where `shuffle=False` is the default for CV! – Stev Apr 11 '18 at 12:50
  • 1
    Thanks, this is very strange that stratified is True by default, because you can't run a regression. But now it works! Another question, if I specify *metrics = 'mae'*, then *xgb.cv* returns the following: *test-mae-mean, test-mae-std, train-mae-mean, train-mae-std*. But if I do that with *lgb.cv* then it returns only *l1-mean* and *l1-stdv*. Why it didn't return mean absolute error? If I understand correctly l1 stands for lasso regression (regularization)? – Marius Apr 11 '18 at 12:56
  • 1
    @Marius, L1 norm and MAE are the same thing. See the [LightGBM docs](http://testlightgbm.readthedocs.io/en/latest/Parameters.html#metric-parameters) or the [scikit-learn docs](http://scikit-learn.org/stable/modules/model_evaluation.html). – Stev Apr 11 '18 at 14:38