0

I would like to join a 2 dimensional tibble to another one as follows:

library(tidyverse)
set.seed(1)
tib1 <- tibble(locid = seq(2))
tib2 <- tibble(x=runif(1), y = x * 2)

I've tried the following:

 tib3 <- tib1 %>% 
    mutate(z = list(tib2)) %>% 
    unnest

However, this produces:

locid x y
1   0.2655087   0.5310173       
2   0.2655087   0.5310173   

i.e. the values are repeated. I'd like to make it such that tib2 gets resampled for each row. How is this possible?

The expected output would be:

locid x y
1   0.2655087   0.5310173       
2   0.1848823   0.3697645   

Thank you very much.

  • What is the expected output? – Sonny Apr 12 '19 at 20:26
  • @ Sonny, I've edited to show the expected output. – Vincent Risington Apr 12 '19 at 20:37
  • Since `tib2` contains only one row you cannot join it to `tib1` and expect it to grow an additional row. – Cettt Apr 12 '19 at 20:47
  • 2
    Do you need to create tib2? You could just run that code within tib3 i.e. `library(tidyverse) set.seed(1) tib1 <- tibble(locid = seq(2)) tib3 <- tib1 %>% rowwise() %>% mutate(x = runif(1), y = x * 2) %>% unnest` Although I don't get the same second row that you have, not sure what you ran in between setting your seed – Amanda Apr 12 '19 at 20:48
  • 2
    It seems like what you're asking for is equivalent to `tib1 %>% mutate(x = runif(nrow(tib1)), y = x * 2)`. – eipi10 Apr 12 '19 at 20:49
  • @Amanda Yes, I need to create tib2 (I simplified my problem) but you put me on the right track. If I use `rowwise` and `list` that does the trick. – Vincent Risington Apr 12 '19 at 21:18
  • 1
    If there's a one-to-one correspondence between the row number of each tibble, then you could add the row numbers as a column to each tibble and join: `tib1 %>% rownames_to_column() %>% left_join(tib2 %>% rownames_to_column())`, or just do `bind_cols(tib1, tib2)`, but it doesn't seem like either approach is necessary here. – eipi10 Apr 12 '19 at 21:22
  • @eipi10 - my main concern is performance. Previously I was using `bind` but once I add it to the main tibble I apply a function to it, so wanted to do it in a single step. I've been reading up on the `rowwise` / `group_by` solution and while it works, it's slow: https://community.rstudio.com/t/dplyr-alternatives-to-rowwise/8071/3 – Vincent Risington Apr 12 '19 at 21:29
  • We might be able to provide additional advice on performance if you provide more context on what you're trying to accomplish. There may be a vectorized solution that's much faster than `rowwise()` or its alternatives. – eipi10 Apr 12 '19 at 21:35
  • Hmm, maybe `cbind` is the best way (I assume it's a vectorized approach?) `tib1 <- tibble(locid = seq(n)); tib2 <- tibble(x=runif(tib1 %>% nrow), y = x * 2); tib3 <- tib1 %>% cbind(tib2)` – Vincent Risington Apr 12 '19 at 21:50
  • The real-life problem has millions of rows and would be implemented as follows: `library(copula); tib3 <- tib1 %>% cbind(rCopula(tib1 %>% nrow, gumbelCopula(2.7)) %>% as_tibble)` – Vincent Risington Apr 12 '19 at 21:53
  • Yes, that should be very fast, even with multi-million-row objects. My laptop is relatively new and fast, but for reference, I ran the copula code to generate 10 million values in 3.3 seconds. The `cbind` operation took 0.13 seconds. – eipi10 Apr 12 '19 at 22:01
  • Thanks for checking. I also noticed you mentioned the `bind_cols` function which may be faster still... Sorry, what did you use to measure performance? – Vincent Risington Apr 12 '19 at 22:06
  • 1
    It's actually slower (at least in this case). – eipi10 Apr 12 '19 at 22:06

0 Answers0