4

I want to train a regression model using Light GBM, and the following code works fine:

import lightgbm as lgb

d_train = lgb.Dataset(X_train, label=y_train)
params = {}
params['learning_rate'] = 0.1
params['boosting_type'] = 'gbdt'
params['objective'] = 'gamma'
params['metric'] = 'l1'
params['sub_feature'] = 0.5
params['num_leaves'] = 40
params['min_data'] = 50
params['max_depth'] = 30

lgb_model = lgb.train(params, d_train, 1000)

#Prediction
y_pred=lgb_model.predict(X_test)
mae_error = mean_absolute_error(y_test,y_pred)

print(mae_error)

But when I proceed to using GridSearchCV, I encounter problems. I am not completely sure how to set this up correctly. I found useful sources, for example here, but they seem to be working with a classifier.

1st try:

from sklearn.metrics import make_scorer
score_func = make_scorer(mean_absolute_error, greater_is_better=False)

model = lgb.LGBMClassifier( 
    boosting_type="gbdt",
    objective='regression',
    is_unbalance=True, 
    random_state=10, 
    n_estimators=50,
    num_leaves=30, 
    max_depth=8,
    feature_fraction=0.5,  
    bagging_fraction=0.8, 
    bagging_freq=15, 
    learning_rate=0.01,    
)

params_opt = {'n_estimators':range(200, 600, 80), 'num_leaves':range(20,60,10)}
gridSearchCV = GridSearchCV(estimator = model, 
    param_grid = params_opt, 
    scoring=score_func)
gridSearchCV.fit(X_train,y_train)
gridSearchCV.grid_scores_, gridSearchCV.best_params_, gridSearchCV.best_score_

, gives me a bunch of error before:

"ValueError: Unknown label type: 'continuous'"

UPDATE: I made the code run switching LGBMClassifier with LGBMModel. Should I try to use LGBMRegressor too, or does this not matter? (source: https://lightgbm.readthedocs.io/en/latest/_modules/lightgbm/sklearn.html)

kevins_1
  • 1,268
  • 2
  • 9
  • 27
Helen
  • 533
  • 12
  • 37
  • 1
    You used `LGBMClassifier` but you defined `objective: 'regression'`. Try either `LGBMRegressor` if your pred value is continous OR `objective: binary` if your task is classification. – ipramusinto Oct 06 '18 at 15:26
  • Yes, thank you, I just figured that out :) Would you have any tips as to what ranges the different parameters should be if I have a lot of data? – Helen Oct 06 '18 at 15:28

1 Answers1

2

First of all, it is unclear what is the nature of you data and thus what type of model fits better. You use L1 metric, so i assume you have some sort of regression problem. If not, please correct me and elaborate why do you use L1 metric then. If yes, then it is unclear why do you use LGBMClassifier at all, since it serves classification problems (as @bakka has already pointed out).

Note, that in practise LGBMModel is the same as LGBMRegressor (you can see it in the code). However, there is no guarantee that this will remain so in the long-term future. So if you want to write good and maintainable code - do not use the base class LGBMModel, unless you know very well what you are doing, why and what are the consequences.

Regarding the parameter ranges: see this answer on github

Mischa Lisovyi
  • 3,207
  • 18
  • 29