3

I want to use some Light gbm functions properly.

This is standard approach, it's no different than any other classifier from sklearn:

  • define X, y
  • train_test_split
  • create classifier
  • fit on train
  • predict on test
  • compare

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)  
    #here maybe DecisionTreeClassifier(), RandomForestClassifier() etc
    model = lgb.LGBMClassifier()
    model.fit(X_train, y_train)
    
    predicted_y = model.predict(X_test)
    
    print(metrics.classification_report())
    

but light gbm has its own functions like lgb.Dataset, Booster.

However, in this kaggle notebook, it's not calling LightGBMClassifier at all! Why?

what is the standard order to call lgbm functions and train models the 'lgbm' way?

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

#why need this Dataset wrapper around x_train,y_train?

d_train = lgbm.Dataset(X_train, y_train)


#where is light gbm classifier()?
bst = lgbm.train(params, d_train, 50, early_stopping_rounds=100)

preds = bst.predict(y_test)

why does it train right away?

kaban
  • 423
  • 1
  • 5
  • 10

1 Answers1

0

LightGBM has a few different API with different names of the methods (LGBMClassifier, Booster, train, etc.), parameters, and sometimes different types of data, that is why train method does not need to call LGBMClassifier but needs another type of dataset. There is no right/wrong/standard way - all of them are good if well used. https://lightgbm.readthedocs.io/en/latest/Python-API.html#training-api