Questions tagged [xgboost]

XGBoost is a library for constructing boosted tree models in R, Python, Java, Scala, and C++. Use this tag for issues specific to the package (i.e., input/output, installation, functionality).

Before using the XGBoost tag, try to test whether your issue is related specifically to the functionality of XGBoost. Often, problems arise from the surrounding model-building environment (such as R's caret or Python's scikit-learn), the quality of the data being used, or purely statistical concerns that might belong on Cross Validated.

2788 questions
19
votes
4 answers

How to extract decision rules (features splits) from xgboost model in python3?

I need to extract the decision rules from my fitted xgboost model in python. I use 0.6a2 version of xgboost library and my python version is 3.5.2. My ultimate goal is to use those splits to bin variables ( according to the splits). I did not come…
Artiga
  • 776
  • 2
  • 16
  • 37
18
votes
3 answers

Difference between original xgboost (Learning API) and sklearn XGBClassifier (Scikit-Learn API)

I use the xgboots sklearn interface below to create and train an xgb model-1. clf = xgb.XGBClassifier(n_estimators = 100, objective= 'binary:logistic',) clf.fit(x_train, y_train, early_stopping_rounds=10, eval_metric="auc", eval_set=[(x_valid,…
ybdesire
  • 1,593
  • 1
  • 20
  • 35
17
votes
5 answers

Python's Xgoost: ValueError('feature_names may not contain [, ] or <')

Python's implementation of XGBClassifier does not accept the characters [, ] or <' as features names. If that occurs, it raises the following: ValueError('feature_names may not contain [, ] or <') It would seem that the obvious solution would be…
sapo_cosmico
  • 6,274
  • 12
  • 45
  • 58
17
votes
3 answers

Grid Search and Early Stopping Using Cross Validation with XGBoost in SciKit-Learn

I am fairly new to sci-kit learn and have been trying to hyper-paramater tune XGBoost. My aim is to use early stopping and grid search to tune the model parameters and use early stopping to control the number of trees and avoid overfitting. As I am…
George
  • 674
  • 2
  • 7
  • 19
17
votes
1 answer

Xgboost: what is the difference among bst.best_score, bst.best_iteration and bst.best_ntree_limit?

When I use xgboost to train my data for a 2-cates classification problem,I'd like to use the early stopping to get the best model, but I'm confused about which one to use in my predict as the early stop will return 3 different choices. For example,…
LancelotHolmes
  • 659
  • 1
  • 10
  • 31
17
votes
4 answers

R - XGBoost: Error building DMatrix

I am having trouble using the XGBoost in R. I am reading a CSV file with my data: get_data = function() { #Loading Data path = "dados_eye.csv" data = read.csv(path) #Dividing into two groups train_porcentage = 0.05 train_lines =…
17
votes
2 answers

How to change size of plot in xgboost.plot_importance?

xgboost.plot_importance(model, importance_type='gain') I am not able to change size of this plot. I want to save this figure with proper size so that I can use it in pdf. I want similar like figize
dsl1990
  • 1,157
  • 5
  • 13
  • 25
17
votes
3 answers

convert python xgboost dMatrix to numpy ndarray or pandas DataFrame

I'm following a xgboost example on their main git at - https://github.com/dmlc/xgboost/blob/master/demo/guide-python/basic_walkthrough.py#L64 in this example they are reading files directly put into dMatrix - dtrain =…
howard
  • 255
  • 1
  • 4
  • 12
16
votes
2 answers

The loss function and evaluation metric of XGBoost

I am confused now about the loss functions used in XGBoost. Here is how I feel confused: we have objective, which is the loss function needs to be minimized; eval_metric: the metric used to represent the learning result. These two are totally…
Bs He
  • 717
  • 1
  • 10
  • 22
16
votes
0 answers

xgboost.plot_tree: binary feature interpretation

I've built an XGBoost model and seek to examine the individual estimators. For reference, this was a binary classification task with discrete and continuous input features. The input feature matrix is a scipy.sparse.csr_matrix. When I went to…
blacksite
  • 12,086
  • 10
  • 64
  • 109
16
votes
7 answers

Save SHAP summary plot as PDF/SVG

I'm currently working on a classification problem and want to create visualizations of feature importance. I use the Python XGBoost package which already provides feature importance plots. However, I found shap (https://github.com/slundberg/shap), a…
Roqua
  • 161
  • 1
  • 1
  • 4
16
votes
3 answers

How to know the number of tree created in XGBoost

I have a question about XGBoost. Do you know how to know the number of tree created in XGBoost? Unlike RandomForest, which model maker decides how many trees are made, XGBoost basically continues to create the trees till the loss function reaches…
kanam
  • 181
  • 2
  • 5
16
votes
1 answer

execinfo.h missing when installing xgboost in Cygwin

I've follow the following tutorial in order to install xgboost python package within Cygwin64: https://www.ibm.com/developerworks/community/blogs/jfp/entry/Installing_XGBoost_For_Anaconda_on_Windows But when executing the make in dmlc-core directory…
mllamazares
  • 7,876
  • 17
  • 61
  • 89
16
votes
4 answers

Sklearn pass fit() parameters to xgboost in pipeline

Similar to How to pass a parameter to only one part of a pipeline object in scikit learn? I want to pass parameters to only one part of a pipeline. Usually, it should work fine like: estimator = XGBClassifier() pipeline = Pipeline([ ('clf',…
Georg Heiler
  • 16,916
  • 36
  • 162
  • 292
16
votes
4 answers

REAL() can only be applied to a 'numeric', not a 'integer'

Though question seems to be duplicate, i'm posting this as non of them gave a solution and relevant to my problem. dtrain<-xgb.DMatrix(data=data.matrix(train),label=data[t,c(31)]) Error in xgb.DMatrix(data = data.matrix(train), label = data[t,…
Shankar Pandala
  • 969
  • 2
  • 8
  • 28