8

In this SO Question bootstrapping by several groups and subgroups seemed to be easy using the broom::bootstrap function specifying the by_group argument with TRUE.

My desired output is a nested tibble with n rows where the data column contains the bootstrapped data generated by each bootstrap call (and each group and subgroup has the same amount of cases as in the original data).

In broom I did the following:

# packages
library(dplyr)
library(purrr)
library(tidyr)
library(tibble)
library(rsample)
library(broom)

# some data to bootstrap
set.seed(123)
data <- tibble(
  group=rep(c('group1','group2','group3','group4'), 25),
  subgroup=rep(c('subgroup1','subgroup2','subgroup3','subgroup4'), 25),
  v1=rnorm(100),
  v2=rnorm(100)
)

# the actual approach using broom::bootstrap
tibble(id = 1:100) %>% 
  mutate(data = map(id, ~ {data %>%
      group_by(group,subgroup) %>% 
      broom::bootstrap(100, by_group=TRUE)}))

Since the broom::bootstrap function is deprecated, I rebuild my approach with the desired output using rsample::bootstraps. It seems to be much more complicated to get my desired output. Am I doing something wrong or have things gotten more complicated in the tidyverse when generating grouped bootstraps?

data %>%
  dplyr::mutate(group2 = group,
                subgroup2 = subgroup) %>% 
  tidyr::nest(-group2, -subgroup2) %>% 
  dplyr::mutate(boot  = map(data, ~ rsample::bootstraps(., 100))) %>% 
  pull(boot) %>% 
  purrr::map(., "splits") %>% 
  transpose %>% 
  purrr::map(., ~ purrr::map_dfr(., rsample::analysis)) %>% 
  tibble(id = 1:length(.), data = .)
TimTeaFan
  • 17,549
  • 4
  • 18
  • 39

1 Answers1

1

Annoyingly, the strata argument to rsample::bootstraps() only accepts a single variable, but we can use tidyr::unite() to solve this.

I'm hoping this gets you what you want.

data %>%
  unite("final_group", group, subgroup, remove = FALSE) %>%
  rsample::bootstraps(100, strata = final_group) %>%
  transmute(
    id = 1:100,
    data = map(splits, rsample::analysis)
  )
Ashby Thorpe
  • 411
  • 1
  • 6
  • Sorry for getting back to you answer late, I was travelling. The idea with `unite` and the `strata` argument is great. However, in your current approach, every bootstrap interation contains the same data. If you assign the result to `res` then try `res$data[[1]]` and `res$data[[2]]`. Its the same data. In my approach each iteration is unique. – TimTeaFan Apr 25 '23 at 05:21