Prediction Intervals from Quantile Regression Forests have higher coverage than expected?

Question

Question:

What factors may cause the prediction interval to have wider coverage than would be expected? Particularly with regard to quantile regression forests with the ranger package?

Specific Context + REPREX:

I am using quantile regression forests through parsnip and the tidymodels suite of packages with ranger to generate prediction intervals. I was reviewing an example using the ames housing data and was surprised to see in the example below that my 90% prediction intervals had an empirical coverage of ~97% when evaluated on a hold-out dataset (coverage on the training data was even higher).

This was even more surprising given that my model performance is substantially worse on the hold-out set than on the training set hence I would have guessed the coverage would have been less than expected, not greater than expected?

Load libraries, data, set-up split:

```{r}
library(tidyverse)
library(tidymodels)
library(AmesHousing)

ames <- make_ames() %>% 
  mutate(Years_Old = Year_Sold - Year_Built,
         Years_Old = ifelse(Years_Old < 0, 0, Years_Old))

set.seed(4595)
data_split <- initial_split(ames, strata = "Sale_Price", p = 0.75)

ames_train <- training(data_split)
ames_test  <- testing(data_split)
```

Specify model workflow:

```{r}
rf_recipe <- 
  recipe(
    Sale_Price ~ Lot_Area + Neighborhood  + Years_Old + Gr_Liv_Area + Overall_Qual + Total_Bsmt_SF + Garage_Area, 
    data = ames_train
  ) %>%
  step_log(Sale_Price, base = 10) %>%
  step_other(Neighborhood, Overall_Qual, threshold = 50) %>% 
  step_novel(Neighborhood, Overall_Qual) %>% 
  step_dummy(Neighborhood, Overall_Qual) 

rf_mod <- rand_forest() %>% 
  set_engine("ranger", importance = "impurity", seed = 63233, quantreg = TRUE) %>% 
  set_mode("regression")

set.seed(63233)
rf_wf <- workflows::workflow() %>% 
  add_model(rf_mod) %>% 
  add_recipe(rf_recipe) %>% 
  fit(ames_train)
```

Make predictions on training and hold-out datasets:

```{r}
rf_preds_train <- predict(
  rf_wf$fit$fit$fit, 
  workflows::pull_workflow_prepped_recipe(rf_wf) %>% bake(ames_train),
  type = "quantiles",
  quantiles = c(0.05, 0.50, 0.95)
  ) %>% 
  with(predictions) %>% 
  as_tibble() %>% 
  set_names(paste0(".pred", c("_lower", "", "_upper"))) %>% 
  mutate(across(contains(".pred"), ~10^.x)) %>% 
  bind_cols(ames_train)

rf_preds_test <- predict(
  rf_wf$fit$fit$fit, 
  workflows::pull_workflow_prepped_recipe(rf_wf) %>% bake(ames_test),
  type = "quantiles",
  quantiles = c(0.05, 0.50, 0.95)
  ) %>% 
  with(predictions) %>% 
  as_tibble() %>% 
  set_names(paste0(".pred", c("_lower", "", "_upper"))) %>% 
  mutate(across(contains(".pred"), ~10^.x)) %>% 
  bind_cols(ames_test)
```

Show that coverage rate is far higher for both the training and hold-out data than the 90% expected (empirically seems to be ~98% and ~97% respectively):

```{r}
rf_preds_train %>%
  mutate(covered = ifelse(Sale_Price >= .pred_lower & Sale_Price <= .pred_upper, 1, 0)) %>% 
  summarise(n = n(),
            n_covered = sum(
              covered
            ),
            covered_prop = n_covered / n,
            stderror = sd(covered) / sqrt(n)) %>% 
  mutate(min_coverage = covered_prop - 2 * stderror,
         max_coverage = covered_prop + 2 * stderror)
# # A tibble: 1 x 6
#       n n_covered covered_prop stderror min_coverage max_coverage
#   <int>     <dbl>        <dbl>    <dbl>        <dbl>        <dbl>
# 1  2199      2159        0.982  0.00285        0.976        0.988

rf_preds_test %>%
  mutate(covered = ifelse(Sale_Price >= .pred_lower & Sale_Price <= .pred_upper, 1, 0)) %>% 
  summarise(n = n(),
            n_covered = sum(
              covered
            ),
            covered_prop = n_covered / n,
            stderror = sd(covered) / sqrt(n)) %>% 
  mutate(min_coverage = covered_prop - 2 * stderror,
         max_coverage = covered_prop + 2 * stderror)
# # A tibble: 1 x 6
#       n n_covered covered_prop stderror min_coverage max_coverage
#   <int>     <dbl>        <dbl>    <dbl>        <dbl>        <dbl>
# 1   731       706        0.966  0.00673        0.952        0.979
```

Guesses:

Something about the ranger package or quantile regression forests is overly extreme in the way it estimates quantiles, or I am overfitting in the 'extreme' direction somehow -- leading to my highly conservative prediction intervals
This is a quirk specific to this dataset / model
I am missing something or setting-up something incorrectly

Stemmed from related blog post: https://www.bryanshalloway.com/2021/04/21/quantile-regression-forests-for-prediction-intervals/ — Bryan Shalloway, Apr 22 '21 at 04:44
Cross-linked question to github at [imbs-hl/ranger](https://github.com/imbs-hl/ranger/issues/136#issuecomment-825169509). — Bryan Shalloway, Apr 22 '21 at 20:40

Prediction Intervals from Quantile Regression Forests have higher coverage than expected?

0 Answers0