Questions tagged [lightgbm]

LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed and efficient with the following advantages: ... Support of parallel and GPU learning. Capable of handling large-scale data.

LightGBM is a high performance gradient boosting (GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks. It is under the umbrella of the DMTK(http://github.com/microsoft/dmtk) project of Microsoft.

Resources:

676 questions
4
votes
1 answer

Transform SHAP values from raw to native units with lightgbm Tweedie objective?

The utility of Shapley Additive Explanations (SHAP values) is to understand how each feature contributes to a model's prediction. For some objectives, such as regression with RMSE as an objective function, SHAP values are in the native units of the…
kdoherty
  • 53
  • 7
4
votes
2 answers

Feature importance with LightGBM

I have trained a model using several algorithms, including Random Forest from skicit-learn and LightGBM. and these model performs similarly in term of accuracy and other stats. The issue is the inconsistent behavior between these two algorithms in…
4
votes
1 answer

Using optuna LightGBMTunerCV as starting point for further search with optuna

I'm trying to use LightGBM for a regression problem (mean absolute error/L1 - or similar like Huber or pseud-Huber - loss) and I primarily want to tune my hyperparameters. LightGBMTunerCV in optuna offers a nice starting point, but after that I'd…
Björn
  • 644
  • 10
  • 23
4
votes
1 answer

Multiprocessing Pooling and Lightgbm

i am trying to train completely independent tasks using multiprocess pooling in python, which lightgbm for training(i am not sure if this is relevant for problem). Here is the code from sklearn.datasets import load_breast_cancer import pandas as…
4
votes
1 answer

LightGBM early stopping with custom eval function and built-in loss function

I am using LightGBM for a binary classification project. I use the built-in 'logloss' as the loss function. However, I want to use early_stopping to stop the iterations when it yields the highest Precision_Recall AUC value. So I have implemented the…
David293836
  • 1,165
  • 2
  • 18
  • 36
4
votes
1 answer

How to implement a negative binomial loss function in python to use in light GBM?

I have a machine learning problem that I believe the negative binomial loss function would fit well, but the light gbm package doesn't have it as a standard, I'm trying to implement it, but I'm don't know how to get Gradient and Hessian, does anyone…
4
votes
1 answer

Can you access scores for each boosting round in LightGBM.train()?

Basic Info lgbm.train() with early_stopping calculates the objective function & feval scores after each boost round, and we can make it print those every verbose_eval rounds, like so: bst=lgbm.train(**params) [10] valid_0's binary_logloss:…
Mark_Anderson
  • 1,229
  • 1
  • 12
  • 34
4
votes
0 answers

LightGBM very slow on AWS but not locally

I have a i7 9700 CPU @ 3.00GHz on my local machine and I am able to tune and train my LightGBM model in around 5 hours. When I repeat the procedure on an AWS EC2 instance, c5.12xlarge, I see a significantly slower training time and it can take up to…
4
votes
2 answers

How to set the weight in muticlass (4 classes) classification in lightgbm for imbalanced dataset?

I am trying to using lightgbm to classify a 4-classes problem. But the 4-classes are imbalanced and nearly 2000:1:1:1. In lightgbm, the params 'is_unbalance' and scale_pos_weight are just for binary classification. params = { …
Chao MI
  • 41
  • 1
  • 4
4
votes
1 answer

LightGBM hyperparameter tuning RandomizedSearchCV

I have a dataset with the following dimensions for training and testing sets: X_train = (58149, 9) y_train = (58149,) X_test = (24921, 9) y_test = (24921,) The code that I have for RandomizedSearchCV using LightGBM classifier is as follows: #…
Arun
  • 2,222
  • 7
  • 43
  • 78
4
votes
1 answer

LightGBM fit throws "ValueError: Circular reference detected" with categorical feature from pd.cut

I have been using with great satisfaction lightGBM models, as I have big datasets with tens of features and million of rows, with lots of categorical columns. I like a lot the way lightGBM can get a pandas dataframe with categorical features…
Marcello
  • 327
  • 1
  • 2
  • 11
4
votes
1 answer

PySpark feature vector to allow NULL values

I would like to use a classifier in PySpark on a dataset that includes NULL values. The NULL values appear in features I have created, such as Success Percentage. I need to keep the NULL value, because I have shown via pandas that keeping the NULL…
Spainey
  • 382
  • 1
  • 11
4
votes
1 answer

Grid search with LightGBM regression

I want to train a regression model using Light GBM, and the following code works fine: import lightgbm as lgb d_train = lgb.Dataset(X_train, label=y_train) params = {} params['learning_rate'] = 0.1 params['boosting_type'] =…
Helen
  • 533
  • 12
  • 37
4
votes
1 answer

LightGBM: loading from json

I am trying to load a LightGBM.Booster from a JSON file pointer, and can't find an example online. import json ,lightgbm import numpy as np X_train = np.arange(0, 200).reshape((100, 2)) y_train = np.tile([0, 1], 50) tr_dataset =…
Sam Shleifer
  • 1,716
  • 2
  • 18
  • 29
4
votes
1 answer

using lightgbm with average precision recall score

I am using LightGBM and would like to use average precision recall as a metric. I tried defining feval: cv_result = lgb.cv(params=params, train_set=lgb_train, feature_name=Rel_Feat_Names, feval=APS) where APS defined as: def APS(preds,…
Yochai Edlitz
  • 41
  • 1
  • 3