Questions tagged [xgboost]

XGBoost is a library for constructing boosted tree models in R, Python, Java, Scala, and C++. Use this tag for issues specific to the package (i.e., input/output, installation, functionality).

Before using the XGBoost tag, try to test whether your issue is related specifically to the functionality of XGBoost. Often, problems arise from the surrounding model-building environment (such as R's caret or Python's scikit-learn), the quality of the data being used, or purely statistical concerns that might belong on Cross Validated.

2788 questions
11
votes
1 answer

What is the use of DMatrix?

The docs say: Data Matrix used in XGBoost. DMatrix is an internal data structure that is used by XGBoost, which is optimized for both memory efficiency and training speed. You can construct DMatrix from multiple different sources of data. I get…
Snehangsu
  • 393
  • 1
  • 3
  • 12
11
votes
2 answers

How to apply predict to xgboost cross validation

After some time searching google I feel this might be a nonsensical question, but here it goes. If I use the following code I can produce an xgb regression model, which I can then use to fit on the training set and evaluate the model xgb_reg =…
jon
  • 349
  • 3
  • 4
  • 20
11
votes
1 answer

I got this error 'DataFrame.dtypes for data must be int, float, bool or categorical'

I'm going to train this as an xgboost model. 'start_time','end_time' column was in yyyy-mm-dd hh:mm:ss format. I changed it to string using astype(str) and changed it to yyyymmddhhmmss format using regular expressions. xgb_model =…
mineral
  • 499
  • 2
  • 6
  • 17
11
votes
0 answers

XGBoost model: train on GPU, run on CPU without GPU RAM allocation

How can I train an XGBoost model on a GPU but run predictions on CPU without allocating any GPU RAM? My situation: I create an XGBoot model (tree_method='gpu_hist') in Python with predictor='cpu_predictor', then I train it on GPU, then I save…
S.V
  • 2,149
  • 2
  • 18
  • 41
11
votes
1 answer

XGBoost error - Unknown objective function reg:squarederror

I am training a xgboost model for regression task and I passed the following parameters - params = {'eta':0.4, 'max_depth':5, 'colsample_bytree':0.6, 'objective':'reg:squarederror'} num_round = 10 xgb_model = xgboost.train(params, dtrain_x,…
Ankit Seth
  • 729
  • 1
  • 9
  • 23
11
votes
1 answer

Cross-validation and parameters tuning with XGBoost and hyperopt

One way to do nested cross-validation with a XGB model would be: from sklearn.model_selection import GridSearchCV, cross_val_score from xgboost import XGBClassifier # Let's assume that we have some data for a binary classification # problem : X…
11
votes
2 answers

python xgboost continue training on existing model

Lets say I build an xgboost model: bst = xgb.train(param0, dtrain1, num_round, evals=[(dtrain, "training")]) Where: param0 is a set of params to xgb, dtrain1 is a DMatrix ready to be trained num_round is the number of rounds Then, I save the…
Eran Moshe
  • 3,062
  • 2
  • 22
  • 41
11
votes
3 answers

How to get each individual tree's prediction in xgboost?

Using xgboost.Booster.predict can only get the prediction result of all the tree or the predicted leaf of each tree. But how could I get the prediction value of each tree?
K_Augus
  • 372
  • 2
  • 14
11
votes
3 answers

Xgboost dealing with imbalanced classification data

I have a dataset of some 20000 training examples, on which i want to do a binary classification. The problem is the dataset is heavily imbalanced with only around 1000 being in the positive class. I am trying to use xgboost (in R) for doing my…
Vikash Balasubramanian
  • 2,921
  • 3
  • 33
  • 74
11
votes
3 answers

xgboost plot importance figure size

How can I change the figure size of xgboost's plot importance function? Trying to pass a figsize=(10,20) fails with the exception of unknown attribute.
Georg Heiler
  • 16,916
  • 36
  • 162
  • 292
11
votes
0 answers

Trying to use xgboost for pairwise ranking

Using the python API from the documentation of xgboost I am creating the train data by: dtrain = xgb.DMatrix(file_path) Here file_path is of libsvm format txt file. As I am doing pairwise ranking I am also inputting the length of the groups in the…
Mpizos Dimitris
  • 4,819
  • 12
  • 58
  • 100
11
votes
3 answers

xgboost: handling of missing values for split candidate search

in section 3.4 of their article, the authors explain how they handle missing values when searching the best candidate split for tree growing. Specifically, they create a default direction for those nodes with, as splitting feature, one with missing…
pmarini
  • 121
  • 1
  • 1
  • 6
11
votes
2 answers

xgb.plot_tree font size python

I make a picture as bellow import matplotlib.pylab as plt %matplotlib inline from matplotlib.pylab import rcParams ..... i miss code for xgboost xgb.plot_tree(clf, num_trees=2) And i want to increase font size font = {'size' :…
Edward
  • 4,443
  • 16
  • 46
  • 81
11
votes
9 answers

xgboost predict method returns the same predicted value for all rows

I've created an xgboost classifier in Python: train is a pandas dataframe with 100k rows and 50 features as columns. target is a pandas series xgb_classifier = xgb.XGBClassifier(nthread=-1, max_depth=3, silent=0, …
mistakeNot
  • 743
  • 2
  • 10
  • 24
10
votes
1 answer

The default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'

I am trying to fit XGBClassifier to my dataset after hyperparameter tuning using optuna and I keep getting this warning: the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss' Below is my…
spectre
  • 717
  • 7
  • 21