0

I am using XGBoost's sklearn API with sklearn's RandomizedSearchCV() to train a boosted tree model with cross validation. My problem is imbalanced, so I've supplied the scale_pos_weight parameter to my XGBClassifier. For simplicity, let's say that I'm doing cross validation with two folds (k = 2). At the end of this post, I've provided an example of the model I'm fitting.

How is accuracy (or any metric) calculated on the validation set? Is the accuracy weighted using the scale_pos_weights argument given to XGBoost, or does sklearn calculate the unweighted accuracy?

import xgboost as xgb
from sklearn.model_selection import RandomForestClassifier

xgb_estimator = xgb.XGBClassifier(booster = "gbtree")

tune_grid = {"scale_pos_weight": [1, 10, 100],  "max_depth": [1, 5, 10]} # simple hyperparameters as example.

xgb_seearch = RandomizedSearchCV(xgb_estimator, tune_grid, cv=2, n_iter = 10, 
                                 scoring = "accuracy", refit = True, 
                                 return_train_score = True)

results = xgb_search.fit(X, y)
results.cv_results # look at cross validation metrics
Alexander L. Hayes
  • 3,892
  • 4
  • 13
  • 34
Eli
  • 280
  • 1
  • 3
  • 13

1 Answers1

0

Does sklearn calculate the unweighted accuracy?

scikit-learn will select for the model maximizing the scoring parameter. When "accuracy" is provided, it will return unweighted accuracy—which is usually biased on imbalanced problems.

Pass scoring="balanced_accuracy" to compute the weighted version.

Alexander L. Hayes
  • 3,892
  • 4
  • 13
  • 34