Alternative in R to cbind loop when final size unknown

Question

I need to calculate for each element of my list all the combination of the elements inside and stock all the combination ( b in the example). I am currently doing like in my example but for a bigger list it is really slow because of the use of cbind in the loop. as i dont know the total final size of my vector b, i cant allocate an object at first. i am looking for other alternatives to make it more efficient.

b=0
a = list(id1=c(1,2,3,4,5,6),id2=c(10,11,12))
for(i in 1:length(a)){
   temp=combn(a[[i]],2) 
   b=cbind(b,temp)
}

Even if you don't follow @akrun's advice (a one-liner solution), start with `b <- NULL`, not equal to 0. — Rui Barradas, Jun 20 '18 at 04:24
why don't you know the final size? it's `sum(choose(lengths(a), 2L))`? (plus one if you intentionally initialize `b = 0` instead of `b = NULL`) — MichaelChirico, Jun 20 '18 at 04:25

score 3 · Accepted Answer · answered Jun 20 '18 at 05:27

3

We can do this with base R

do.call(cbind, lapply(a, combn, 2))
#[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
#[1,]    1    1    1    1    1    2    2    2    2     3     3     3     4     4
#[2,]    2    3    4    5    6    3    4    5    6     4     5     6     5     6
#     [,15] [,16] [,17] [,18]
#[1,]     5    10    10    11
#[2,]     6    11    12    12

answered Jun 20 '18 at 05:27

akrun

874,273
37
540
662

1

I was expecting the `purrr::map` solution to be faster, but it isn't (I `microbenchmark`ed with various larger lists); just out of curiosity: Have you got an intuitive explanation why? Anyway +1. – Maurits Evers Jun 20 '18 at 05:33
1

@MauritsEvers All the tidyverse functions are convenient and tidy way to use and efficiency is one of the side-effects. So, some of the functions may not be that efficient compared to base R or data.table. – akrun Jun 20 '18 at 05:37
@MauritsEvers Out of curiosity, the base R method returns a matrix, while the map wrapped with `data.frame` returns data.frame which have some overhead with attributes. Where you comparing `lapply(a, combn, 2)` with `map(a, combn, 2)` – akrun Jun 20 '18 at 05:41
1

Ah I see what you mean. I was comparing `do.call(cbind, lapply(a, combn, 2))` with `map_dfc(a, ~data.frame(combn(.x, 2))`. I was expecting the `do.call(cbind, ...)` part to be slower than `map_dfc`, but it isn't;-) I guess `do.call(cbind, ...)` is actually quite fast because it's essentially a `matrix` operation, whereas `map_dfc` has the overhead as you say from converting to `data.frame`s. – Maurits Evers Jun 20 '18 at 05:46

Maurits Evers · Answer 2 · 2018-06-20T05:27:00.330

Here is an alternative to Arun's solution from the comments, using purrr::map

data.frame(purrr::map(a, combn, 2))
#  id1.1 id1.2 id1.3 id1.4 id1.5 id1.6 id1.7 id1.8 id1.9 id1.10 id1.11 id1.12
#1     1     1     1     1     1     2     2     2     2      3      3      3
#2     2     3     4     5     6     3     4     5     6      4      5      6
#  id1.13 id1.14 id1.15 id2.1 id2.2 id2.3
#1      4      4      5    10    10    11
#2      5      6      6    11    12    12

Or

map_dfc(a, ~data.frame(combn(.x, 2)))
#  X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X21 X31
#1  1  1  1  1  1  2  2  2  2   3   3   3   4   4   5  10  10  11
#2  2  3  4  5  6  3  4  5  6   4   5   6   5   6   6  11  12  12

Alternative in R to cbind loop when final size unknown

2 Answers2