What's the difference between using these 2 approaches to light gbm classifier?

Question

I want to use some Light gbm functions properly.

This is standard approach, it's no different than any other classifier from sklearn:

define X, y
train_test_split
create classifier
fit on train
predict on test

compare

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)  
#here maybe DecisionTreeClassifier(), RandomForestClassifier() etc
model = lgb.LGBMClassifier()
model.fit(X_train, y_train)

predicted_y = model.predict(X_test)

print(metrics.classification_report())

but light gbm has its own functions like lgb.Dataset, Booster.

However, in this kaggle notebook, it's not calling LightGBMClassifier at all! Why?

what is the standard order to call lgbm functions and train models the 'lgbm' way?

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

#why need this Dataset wrapper around x_train,y_train?

d_train = lgbm.Dataset(X_train, y_train)


#where is light gbm classifier()?
bst = lgbm.train(params, d_train, 50, early_stopping_rounds=100)

preds = bst.predict(y_test)

why does it train right away?

score 0 · Answer 1 · answered Dec 12 '19 at 13:33

LightGBM has a few different API with different names of the methods (LGBMClassifier, Booster, train, etc.), parameters, and sometimes different types of data, that is why train method does not need to call LGBMClassifier but needs another type of dataset. There is no right/wrong/standard way - all of them are good if well used. https://lightgbm.readthedocs.io/en/latest/Python-API.html#training-api

What's the difference between using these 2 approaches to light gbm classifier?

1 Answers1