4

I am trying to using lightgbm to classify a 4-classes problem. But the 4-classes are imbalanced and nearly 2000:1:1:1.

In lightgbm, the params 'is_unbalance' and scale_pos_weight are just for binary classification.

params = {
    'objective':'multiclassova',
    'num_class':4,
    'is_unbalance':True,
    'metric': 'multi_logloss',
    'max_depth':2,
    'learning_rate':0.15,
    'feature_fraction':0.8,
    'bagging_fraction':0.8,
    'bagging_freq':4,
    'reg_alpha':5,
    'reg_lambda':3,
    'cat_smooth':0,
    'num_iterations':53, 
}
lgb_train = lgb.Dataset(X_train,Y_train, 
categorical_feature=category_feature)
gbm = lgb.train(params,lgb_train)
Chao MI
  • 41
  • 1
  • 4

2 Answers2

1

In order to build a classifier with lightgbm you use the LGBMClassifier. The LGBMClassifier has the parameter class_weight, via which it is possible to directly handle imbalanced data.

For your particular problem you could do the following: (Added parameter class_weight at the end)

params = {
'objective':'multiclassova',
'num_class':4,
'is_unbalance':True,
'metric': 'multi_logloss',
'max_depth':2,
'learning_rate':0.15,
'feature_fraction':0.8,
'bagging_fraction':0.8,
'bagging_freq':4,
'reg_alpha':5,
'reg_lambda':3,
'cat_smooth':0,
'num_iterations':53, 
'class_weight':{'class_label1':2000, 'class_label2':1, 'classlabel3':1, 'classlabel4':1}

}

For further information have a look at the documentation of the LGBMClassifier and in particular at the parameters https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html .

MGCHEM
  • 11
  • 4
-1

You can refer to below link: https://github.com/Microsoft/LightGBM/blob/master/docs/Parameters.rst#weight_column

Details: parameter name : weight_column ,default = "", type = int or string, aliases: weight

Few notes about using :

  • used to specify the weight column
  • use number for index, e.g. weight=0 means column_0 is the weight
  • add a prefix name: for column name, e.g.weight=name:weight
  • works only in case of loading data directly from file
  • index starts from 0 and it doesn't count the label column when passing type is int, e.g. when label is column_0, and weight is column_1, the correct parameter is weight=0
saurabh kumar
  • 506
  • 4
  • 13