How to obtain a confidence interval or a measure of prediction dispersion when using xgboost for classification?

Question

So for example, if xgboost predicts a probability of an event is 0.9, how can the confidence in that probability be obtained?

Also is this confidence assumed to be heteroskedastic?

pplonski · Accepted Answer · 2016-05-24T19:58:22.927

8

To produce confidence intervals for xgboost model you should train several models (you can use bagging for this). Each model will produce a response for test sample - all responses will form a distribution from which you can easily compute confidence intervals using basic statistics. You should produce response distribution for each test sample.

edited May 24 '16 at 19:58

answered May 24 '16 at 19:50

pplonski

5,023
1
30
34

I propose to run 100 models (the more the better ) and check in what range 95% of values lays. The response variable is homoscedastic. – pplonski May 26 '16 at 06:18
3

The mean and standard deviation of the predictions is NOT the same as a confidence interval. – michel Oct 21 '16 at 03:26
Of course that mean and std of predictions is different thing than confidence intervals - the question was how to compute confidence intervals and I gave a recipe for this – pplonski Oct 24 '16 at 07:59
Hello @pplonski, in what do the 100 models differ? just the seed, or the training data too? Thanks – itscarlayall Jun 08 '23 at 11:15
just change seed – pplonski Jun 09 '23 at 08:22

How to obtain a confidence interval or a measure of prediction dispersion when using xgboost for classification?

1 Answers1

Linked