I have one long dataset that is composed of several datasets resulting from multiple imputations (let's say 10 imputations). They have an id variable identifying the imputation. On each of these imputed datasets I would like to bootstrap 10 datasets. After the bootstrap, I want to run models on each (100, imputation bootstrap combinations).
In this example I am not sure whether to use the broom::bootstrap()
function or the modelr::bootstrap()
function. Furthermore, the grouping seems to be lost in my pipeline.
Here is a reproducible example using the mtcars dataset:
library(tidyverse)
library(broom)
cars <- mtcars %>%
mutate(am = as.factor(am)) %>% # This is standing in for my imputation id variable
group_by(am)
Source: local data frame [32 x 11]
Groups: am [2]
# A tibble: 32 x 11
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fctr> <dbl> <dbl>
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
As you can see the output is currently showing that there are two groups, as it should. In my dataset it would show there are 10, for each imputed dataset. Now:
cars2 <- cars %>%
broom::bootstrap(10, by_group = TRUE)
cars2
Source: local data frame [32 x 11]
Groups: replicate [10]
# A tibble: 32 x 11
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fctr> <dbl> <dbl>
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Now it looks as though there are only 10 groups representing each replicate. It didn't seem to preserve the prior grouping. At this point I would expect 20 total groups (2 x 10).
If I now do this:
cars3 <- cars2 %>%
group_by(am)
cars3
Source: local data frame [32 x 11]
Groups: am [2]
# A tibble: 32 x 11
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fctr> <dbl> <dbl>
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Now it seems like there are no replicates only groups for am
.
Is there anyway to do the bootstrapping after i've grouped my original dataset. Also, ideally, after I bootstrap there should be an id that indicates which bootstrapped dataset i'm looking at.
In my ideal world my code should be able to do something like this:
cars <- mtcars %>%
mutate(am = as.factor(am)) %>%
group_by(am) %>%
bootstrap(10, by_group = TRUE) %>%
nest() %>% # create a condensed tidy dataset that has one row per imputation, bootstrap combo
mutate(model = map(data, ~lm(mpg~, data = .)) # Create a model for each row