Questions tagged [xgboost]

XGBoost is a library for constructing boosted tree models in R, Python, Java, Scala, and C++. Use this tag for issues specific to the package (i.e., input/output, installation, functionality).

Before using the XGBoost tag, try to test whether your issue is related specifically to the functionality of XGBoost. Often, problems arise from the surrounding model-building environment (such as R's caret or Python's scikit-learn), the quality of the data being used, or purely statistical concerns that might belong on Cross Validated.

2788 questions
13
votes
4 answers

Xgboost throws an error when trying to import

I have a project that is using xgboost. We now transfer the project to containers. But after installing it using pip, it throws the following error: Traceback (most recent call last): File "restart_db.py", line 5, in from…
NotSoShabby
  • 3,316
  • 9
  • 32
  • 56
13
votes
1 answer

inputs for nDCG in sklearn

I'm unable to understand the input format of sklearn nDcg: http://sklearn.apachecn.org/en/0.19.0/modules/generated/sklearn.metrics.ndcg_score.html Currently I have the following problem: I have multiple queries for each of which the ranking…
Yank Leo
  • 452
  • 5
  • 19
13
votes
4 answers

feature_names must be unique - Xgboost

I am running the xgboost model for a very sparse matrix. I am getting this error. ValueError: feature_names must be unique How can I deal with this? This is my code. yprob = bst.predict(xgb.DMatrix(test_df))[:,1]
user2728024
  • 1,496
  • 8
  • 23
  • 39
13
votes
2 answers

What is difference between eval_metric and feval in xgboost?

What is difference between feval and eval_metric in xgb.train, both parametrs are only for evaluation purpose. Post from Kaggle gives some insight…
Qbik
  • 5,885
  • 14
  • 62
  • 93
13
votes
3 answers

Names features importance plot after preprocessing

Before building a model I make scaling like this X = StandardScaler(with_mean = 0, with_std = 1).fit_transform(X) and after build a features importance plot xgb.plot_importance(bst, color='red') plt.title('importance', fontsize =…
Edward
  • 4,443
  • 16
  • 46
  • 81
13
votes
2 answers

Calibration with xgboost

I'm wondering if I can do calibration in xgboost. To be more specific, does xgboost come with an existing calibration implementation like in scikit-learn, or are there some ways to put the model from xgboost into a scikit-learn's…
OrlandoL
  • 898
  • 2
  • 12
  • 32
12
votes
3 answers

how can I fix this WARNING in Xgboost?

I have an imbalanced dataset with 53987 rows, 32columns and 8 classes. I'm trying to perform multiclass classification. This is my code and the corresponding output: from sklearn.metrics import classification_report, accuracy_score import…
mineral
  • 499
  • 2
  • 6
  • 17
12
votes
2 answers

How to adjust probability threhold in XGBoost classifier when using Scikit-Learn API

I have a question about xgboost classifier with sklearn API. It seems it has a parameter to tell how much probability should be returned as True, but i can't find it. Normally, xgb.predict would return boolean and xgb.predict_proba would return…
劉金喜
  • 121
  • 1
  • 1
  • 3
12
votes
3 answers

XGBoost - n_estimators = 1 equal to single-tree classifier?

I have some training pipeline that heavily uses XGBoost instead of scikit-learn, only because of the way XGBoost cleanly handles null values. However, I'm tasked with introducing non-technical folks to machine learning, and thought it'd be good to…
blacksite
  • 12,086
  • 10
  • 64
  • 109
12
votes
3 answers

How to use XGboost in PySpark Pipeline

I want to update my code of pyspark. In the pyspark, it must put the base model in a pipeline, the office demo of pipeline use the LogistictRegression as an base model. However, it seems not be able to use XGboost model in the pipeline api. How can…
12
votes
2 answers

How to optimize a sklearn pipeline, using XGboost, for a different `eval_metric`?

I'm trying to use XGBoost, and optimize the eval_metric as auc(as described here). This works fine when using the classifier directly, but fails when I'm trying to use it as a pipeline. What is the correct way to pass a .fit argument to the…
sapo_cosmico
  • 6,274
  • 12
  • 45
  • 58
12
votes
1 answer

Parallel processing with xgboost and caret

I want to parallelize the model fitting process for xgboost while using caret. From what I have seen in xgboost's documentation, the nthread parameter controls the number of threads to use while fitting the models, in the sense of, building the…
drgxfs
  • 1,097
  • 1
  • 8
  • 19
12
votes
2 answers

xgboost: AttributeError: 'DMatrix' object has no attribute 'handle'

The problem is really strange, because that piece of worked pretty fine with other dataset. The full code: import numpy as np import pandas as pd import xgboost as xgb from sklearn.cross_validation import train_test_split # # Split the Learning…
Rocketq
  • 5,423
  • 23
  • 75
  • 126
12
votes
1 answer

xgboost binary logistic regression

I am having problems running logistic regression with xgboost that can be summarized on the following example. Lets assume I have a very simple dataframe with two predictors and one target variable: df= pd.DataFrame({'X1' : pd.Series([1,0,0,1]),…
11
votes
1 answer

Xgboost dump and load issues

I trained my xgboost pipeline model on amazon sagemaker and save the file locally: pickle.dump(model, open(file_name, "wb")) Then moving to local computer to use model in inference mode: pickle.load(open(file_name, "rb")) XGBoostError: [11:45:49]…
Petr
  • 1,606
  • 2
  • 14
  • 39