7

How to obtain a confidence interval or a measure of prediction dispersion when using xgboost for classification?

So for example, if xgboost predicts a probability of an event is 0.9, how can the confidence in that probability be obtained?

Also is this confidence assumed to be heteroskedastic?

Greg
  • 8,175
  • 16
  • 72
  • 125

1 Answers1

8

To produce confidence intervals for xgboost model you should train several models (you can use bagging for this). Each model will produce a response for test sample - all responses will form a distribution from which you can easily compute confidence intervals using basic statistics. You should produce response distribution for each test sample.

pplonski
  • 5,023
  • 1
  • 30
  • 34
  • I propose to run 100 models (the more the better ) and check in what range 95% of values lays. The response variable is homoscedastic. – pplonski May 26 '16 at 06:18
  • 3
    The mean and standard deviation of the predictions is NOT the same as a confidence interval. – michel Oct 21 '16 at 03:26
  • Of course that mean and std of predictions is different thing than confidence intervals - the question was how to compute confidence intervals and I gave a recipe for this – pplonski Oct 24 '16 at 07:59
  • Hello @pplonski, in what do the 100 models differ? just the seed, or the training data too? Thanks – itscarlayall Jun 08 '23 at 11:15
  • just change seed – pplonski Jun 09 '23 at 08:22