Questions tagged [xgboost]

XGBoost is a library for constructing boosted tree models in R, Python, Java, Scala, and C++. Use this tag for issues specific to the package (i.e., input/output, installation, functionality).

Before using the XGBoost tag, try to test whether your issue is related specifically to the functionality of XGBoost. Often, problems arise from the surrounding model-building environment (such as R's caret or Python's scikit-learn), the quality of the data being used, or purely statistical concerns that might belong on Cross Validated.

2788 questions
15
votes
3 answers

CPU faster than GPU using xgb and XGBclassifier

I apologize in advance as I am a beginner. I am trying out GPU vs CPU tests with XGBoost using xgb and XGBclassifier. The results are as follows: passed time with xgb (gpu): 0.390s passed time with XGBClassifier (gpu): 0.465s passed time…
cinzero
  • 151
  • 1
  • 4
15
votes
2 answers

feature_names mismach in xgboost despite having same columns

I have training (X) and test data (test_data_process) set with the same columns and order, as indicated below: But when I do predictions = my_model.predict(test_data_process) It gives the following error: ValueError: feature_names mismatch:…
rcs
  • 6,713
  • 12
  • 53
  • 75
15
votes
4 answers

Understanding num_classes for xgboost in R

I'm having a lot of trouble figuring out how to correctly set the num_classes for xgboost. I've got an example using the Iris data df <- iris y <- df$Species num.class = length(levels(y)) levels(y) = 1:num.class head(y) df <- df[,1:4] y <-…
House
  • 195
  • 1
  • 1
  • 5
15
votes
3 answers

What is the output of XGboost using 'rank:pairwise'?

I use the python implementation of XGBoost. One of the objectives is rank:pairwise and it minimizes the pairwise loss (Documentation). However, it does not say anything about the scope of the output. I see numbers between -10 and 10, but can it be…
Soerendip
  • 7,684
  • 15
  • 61
  • 128
14
votes
3 answers

Is the xgboost documentation wrong ? (early stopping rounds and best and last iteration)

here below is a question about xgboost early stopping rounds parameter and how it does, or does not, give the best iteration when it is the reason why the fit ends. In xgboost documentation, one can see in the scikit learn api section (link) that…
Lyxthe Lyxos
  • 278
  • 2
  • 12
14
votes
1 answer

How to get Predictions with XGBoost and XGBoost using Scikit-Learn Wrapper to match?

I am new to XGBoost in Python so I apologize if the answer here is obvious, but I am trying to take a panda dataframe and get XGBoost in Python to give me the same predictions I get when I use the Scikit-Learn wrapper for the same exercise. So far…
Joseph E
  • 143
  • 1
  • 1
  • 5
14
votes
1 answer

How to write a custom evaluation metric in python for xgboost?

I would like to add the kappa evaluation metric to use in xgboost in Python. I am having trouble understanding how to connect a Python function with xgboost. According to the xgboost documentation, a "User can add multiple evaluation metrics, for…
Greg
  • 8,175
  • 16
  • 72
  • 125
14
votes
7 answers

xgboost installation issue with anaconda

I am using Anaconda. I first switched to Python2 (Version 2.7.11). python -V Python 2.7.11 :: Continuum Analytics, Inc. I used the following command to install xgboost in anaconda. conda install -c https://conda.anaconda.org/akode xgboost I then…
wen
  • 1,875
  • 4
  • 26
  • 43
14
votes
9 answers

How can I install XGBoost package in python on Windows

I tried to install XGBoost package in python. I am using windows os, 64bits . I have gone through following. The package directory states that xgboost is unstable for windows and is disabled: pip installation on windows is currently disabled for…
shan
  • 553
  • 2
  • 9
  • 25
14
votes
1 answer

How is xgboost quality calculated?

Could someone explain how the Quality column in the xgboost R package is calculated in the xgb.model.dt.tree function? In the documentation it says that Quality "is the gain related to the split in this specific node". When you run the following…
dataShrimp
  • 808
  • 9
  • 14
14
votes
1 answer

How is xgboost cover calculated?

Could someone explain how the Cover column in the xgboost R package is calculated in the xgb.model.dt.tree function? In the documentation it says that Cover "is a metric to measure the number of observations affected by the split". When you run the…
dataShrimp
  • 808
  • 9
  • 14
14
votes
2 answers

How to use XGBoost algorithm for regression in R?

I was trying the XGBoost technique for the prediction. As my dependent variable is continuous, I was doing the regression using XGBoost, but most of the references available in various portal are for classification. Though i know by using objective…
Amarjeet
  • 907
  • 2
  • 9
  • 14
14
votes
1 answer

How to access weighting of indiviual decision trees in xgboost?

I'm using xgboost for ranking with param = {'objective':'rank:pairwise', 'booster':'gbtree'} As I understand gradient boosting works by calculating the weighted sum of the learned decision trees. How can I access the weights that are assigned to…
саша
  • 521
  • 5
  • 20
13
votes
2 answers

XGBoost for multiclassification and imbalanced data

I am dealing with a classification problem with 3 classes [0,1,2], and imbalanced class distribution as shown below. I want to apply XGBClassifier (in Python) to this classification problem, but the model does not respond to class_weight…
13
votes
1 answer

XGBoost produce prediction result and probability

I am probably looking right over it in the documentation, but I wanted to know if there is a way with XGBoost to generate both the prediction and probability for the results? In my case, I am trying to predict a multi-class classifier. it would be…
scarpacci
  • 8,957
  • 16
  • 79
  • 144