I have a question about xgboost classifier with sklearn API. It seems it has a parameter to tell how much probability should be returned as True, but i can't find it.
Normally, xgb.predict
would return boolean and xgb.predict_proba
would return probability within interval [0,1]. I think the result is related. There should be a probability threshold to decide sample's class.
dtrain, dtest = train_test_split(data, test_size=0.1, random_state=22)
param_dict={'base_score': 0.5,
'booster': 'gbtree',
'colsample_bylevel': 1,
'colsample_bytree': 1,
'gamma': 0,
'learning_rate': 0.1,
'max_delta_step': 0,
'max_depth': 4,
'min_child_weight': 6,
'missing': None,
'n_estimators': 1000,
'objective': 'binary:logistic',
'reg_alpha': 0,
'reg_lambda': 1,
'scale_pos_weight': 1,
'subsample': 1}
xgb = XGBClassifier(**param_dict,n_jobs=2)
xgb.fit(dtrain[features], dtrain['target'])
result_boolean = xgb.predict(dtest[features])
print(np.sum(result_boolean))
Output:936
result_proba = xgb.predict_proba(dtest[features])
result_boolean2= (result_proba[:,1] > 0.5)
print(np.sum(result_boolean2))
Output:936
It looks like the default probability threshold is 0.5, so the result array has same amount of True. But I can't find where to adjust it in the code.
predict(data, output_margin=False, ntree_limit=None, validate_features=True)
Also, I have tested base_score
, but it didn't affect the result.
The main reason I want to change probability threshold is that I want to test XGBClassifier
with different probability threshold by GridSearchCV
method. xgb.predict_proba
seems like it can't be merged into GridSearchCV
. How to change probability threshold in the XGBClassifier
?