Simulate many datasets in tidyr

Question

I want to end up with a tidy data structure like the one below:

N    | r     | data     | stat
---------------------------------
10   | 0.2   | <tibble> | 0.5
20   | 0.3   | <tibble> | 0.86
...

data is generated from the parameters in the first columns and stat is computed on data. If I have the first two columns, how do I add tibbles of datasets?

As a minimal example, here is a function to create two correlated columns:

correlated_data = function(N, r) {
  MASS::mvrnorm(N, mu=c(0, 4), Sigma=matrix(c(1, r, r, 1), ncol=2))
}

Running this for all combinations of N and r, I start by doing

# Make parameter combinations
expand.grid(N=c(10,20,30), r=c(0, 0.1, 0.3)) %>%
  group_by(N, r) %>%
  expand(set=1:100) %>%  # create 100 of each combination

  # HERE! How to add a N x 2 tibble to each row?
  rowwise() %>%
  mutate(data=correlate_data( N, r))

  # Compute summary stats on each (for illustration only; not tested)
  mutate(   
     stats = map(data, ~cor.test(.x[, 1], .x[, 2])),  # Correlation on each
     tidy_stats = map(stats, tidy))  # using broom package

I do have more parameters (N, r, distribution) and I will be computing more summaries. If alternative workflows are better, I welcome that as well.

or `mutate(data= map2(N,r,correlate_data))` and you don't need `rowwise` — moodymudskipper, Aug 29 '18 at 22:33

score 0 · Accepted Answer · answered May 29 '19 at 13:38

This does it for two variables:

map2(N, r, correlated_data)

For more variables, use

pmap(list(N, r), correlated_data)

So the full procedure in the original question becomes:

expand.grid(N=c(10, 20, 30), r=c(0, 0.1, 0.3)) %>%
  group_by(N, r) %>%
  expand(set=1:200) %>%  # create 100 of each combination

  # HERE! How to add a N x 2 tibble to each row?
  mutate(
    data = map2(N, r, correlated_data),
    stats = map(data, ~cor.test(.[, 1], .[,2])),
    tidy_stats = map(stats, tidy)
  ) %>%  # using broom package

  unnest(tidy_stats)

Simulate many datasets in tidyr

1 Answers1