0

I am using a bootstrapped dataset to fit a model. After fitting the model, I would like to change the bootstrapped dataset and use this new dataset to predict.

My problem is that I can't change the bootstrapped dataset. It often tells me that the variable that I am trying to change cannot be found. Other times (as in the case below) it won't let me calculate the mean by bootstrapped sample.

Why is this?

library(tidymodels)
library(broom)

year <- rep(2014:2016, length.out=10000)
group <- factor(sample(c(0,1,2,3,4,5,6), replace=TRUE, size=10000))
female <- sample(c(0,1), replace=TRUE, size=10000)
smoker <- sample(c(0,1), replace=TRUE, size=10000)
dta <- tibble(year = year, group = group, female = female, smoker = smoker)

boot <- bootstraps(dta,
                   times = 2,
                   apparent = TRUE,
                   replace = TRUE)

mods <- boot %>%
  nest(data = c(-all_of(female))) %>%
  mutate(model = map(data, ~ glm(smoker ~ group, data = .,
                                 family = binomial(link = "probit"))))
new_boot <- boot %>%
  group_by(id) %>%  # calculate the mean by bootstrapped sample
  mutate(female=mean(female),
         smoker=mean(smoker))

new_boot # female and smoker are calculated for entire dataset

splits                id         female smoker
  <list>                <chr>       <dbl>  <dbl>
1 <split [10000/3578]>  Bootstrap1  0.492  0.502
2 <split [10000/3681]>  Bootstrap2  0.492  0.502
3 <split [10000/10000]> Apparent    0.492  0.502

Why is this? How can I change the bootstrapped sample?

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
Stata_user
  • 562
  • 3
  • 14

0 Answers0