I want to end up with a tidy data structure like the one below:
N | r | data | stat
---------------------------------
10 | 0.2 | <tibble> | 0.5
20 | 0.3 | <tibble> | 0.86
...
data
is generated from the parameters in the first columns and stat
is computed on data
. If I have the first two columns, how do I add tibbles of datasets?
As a minimal example, here is a function to create two correlated columns:
correlated_data = function(N, r) {
MASS::mvrnorm(N, mu=c(0, 4), Sigma=matrix(c(1, r, r, 1), ncol=2))
}
Running this for all combinations of N
and r
, I start by doing
# Make parameter combinations
expand.grid(N=c(10,20,30), r=c(0, 0.1, 0.3)) %>%
group_by(N, r) %>%
expand(set=1:100) %>% # create 100 of each combination
# HERE! How to add a N x 2 tibble to each row?
rowwise() %>%
mutate(data=correlate_data( N, r))
# Compute summary stats on each (for illustration only; not tested)
mutate(
stats = map(data, ~cor.test(.x[, 1], .x[, 2])), # Correlation on each
tidy_stats = map(stats, tidy)) # using broom package
I do have more parameters (N, r, distribution) and I will be computing more summaries. If alternative workflows are better, I welcome that as well.