I am using a bootstrapped dataset to fit a model. After fitting the model, I would like to change the bootstrapped dataset and use this new dataset to predict.
My problem is that I can't change the bootstrapped dataset. It often tells me that the variable that I am trying to change cannot be found. Other times (as in the case below) it won't let me calculate the mean by bootstrapped sample.
Why is this?
library(tidymodels)
library(broom)
year <- rep(2014:2016, length.out=10000)
group <- factor(sample(c(0,1,2,3,4,5,6), replace=TRUE, size=10000))
female <- sample(c(0,1), replace=TRUE, size=10000)
smoker <- sample(c(0,1), replace=TRUE, size=10000)
dta <- tibble(year = year, group = group, female = female, smoker = smoker)
boot <- bootstraps(dta,
times = 2,
apparent = TRUE,
replace = TRUE)
mods <- boot %>%
nest(data = c(-all_of(female))) %>%
mutate(model = map(data, ~ glm(smoker ~ group, data = .,
family = binomial(link = "probit"))))
new_boot <- boot %>%
group_by(id) %>% # calculate the mean by bootstrapped sample
mutate(female=mean(female),
smoker=mean(smoker))
new_boot # female and smoker are calculated for entire dataset
splits id female smoker
<list> <chr> <dbl> <dbl>
1 <split [10000/3578]> Bootstrap1 0.492 0.502
2 <split [10000/3681]> Bootstrap2 0.492 0.502
3 <split [10000/10000]> Apparent 0.492 0.502
Why is this? How can I change the bootstrapped sample?