R mutate a nested tibble

Question

I would like to join a 2 dimensional tibble to another one as follows:

library(tidyverse)
set.seed(1)
tib1 <- tibble(locid = seq(2))
tib2 <- tibble(x=runif(1), y = x * 2)

I've tried the following:

 tib3 <- tib1 %>% 
    mutate(z = list(tib2)) %>% 
    unnest

However, this produces:

locid x y
1   0.2655087   0.5310173       
2   0.2655087   0.5310173

i.e. the values are repeated. I'd like to make it such that tib2 gets resampled for each row. How is this possible?

The expected output would be:

locid x y
1   0.2655087   0.5310173       
2   0.1848823   0.3697645

Thank you very much.

Since `tib2` contains only one row you cannot join it to `tib1` and expect it to grow an additional row. — Cettt, Apr 12 '19 at 20:47
Do you need to create tib2? You could just run that code within tib3 i.e. `library(tidyverse) set.seed(1) tib1 <- tibble(locid = seq(2)) tib3 <- tib1 %>% rowwise() %>% mutate(x = runif(1), y = x * 2) %>% unnest` Although I don't get the same second row that you have, not sure what you ran in between setting your seed — Amanda, Apr 12 '19 at 20:48
It seems like what you're asking for is equivalent to `tib1 %>% mutate(x = runif(nrow(tib1)), y = x * 2)`. — eipi10, Apr 12 '19 at 20:49
@Amanda Yes, I need to create tib2 (I simplified my problem) but you put me on the right track. If I use `rowwise` and `list` that does the trick. — Vincent Risington, Apr 12 '19 at 21:18
If there's a one-to-one correspondence between the row number of each tibble, then you could add the row numbers as a column to each tibble and join: `tib1 %>% rownames_to_column() %>% left_join(tib2 %>% rownames_to_column())`, or just do `bind_cols(tib1, tib2)`, but it doesn't seem like either approach is necessary here. — eipi10, Apr 12 '19 at 21:22
@eipi10 - my main concern is performance. Previously I was using `bind` but once I add it to the main tibble I apply a function to it, so wanted to do it in a single step. I've been reading up on the `rowwise` / `group_by` solution and while it works, it's slow: https://community.rstudio.com/t/dplyr-alternatives-to-rowwise/8071/3 — Vincent Risington, Apr 12 '19 at 21:29
We might be able to provide additional advice on performance if you provide more context on what you're trying to accomplish. There may be a vectorized solution that's much faster than `rowwise()` or its alternatives. — eipi10, Apr 12 '19 at 21:35
Hmm, maybe `cbind` is the best way (I assume it's a vectorized approach?) `tib1 <- tibble(locid = seq(n)); tib2 <- tibble(x=runif(tib1 %>% nrow), y = x * 2); tib3 <- tib1 %>% cbind(tib2)` — Vincent Risington, Apr 12 '19 at 21:50
The real-life problem has millions of rows and would be implemented as follows: `library(copula); tib3 <- tib1 %>% cbind(rCopula(tib1 %>% nrow, gumbelCopula(2.7)) %>% as_tibble)` — Vincent Risington, Apr 12 '19 at 21:53
Yes, that should be very fast, even with multi-million-row objects. My laptop is relatively new and fast, but for reference, I ran the copula code to generate 10 million values in 3.3 seconds. The `cbind` operation took 0.13 seconds. — eipi10, Apr 12 '19 at 22:01
Thanks for checking. I also noticed you mentioned the `bind_cols` function which may be faster still... Sorry, what did you use to measure performance? — Vincent Risington, Apr 12 '19 at 22:06

R mutate a nested tibble

0 Answers0

Linked