6

I can use do.call to sum two vectors elementwise:

do.call(what="+", args =list(c(0,0,1), c(1,2,3))
>[1] 1 2 4

However, if I'd like to call the same operator with a list of three vectors, it fails:

do.call(what = "+", args = list(c(0,0,1), c(1,2,3), c(9,1,2)))
>Error in `+`(c(0, 0, 1), c(1, 2, 3), c(9, 1, 2)): operator needs one or two arguments

I could use Reduce

Reduce(f = "+", x = list(c(0,0,1), c(1,2,3), c(9,1,2)))
>[1] 10  3  6

but I am aware of the overhead generated by the Reduce operation as compared to do.call and in my REAL application it isn't tolerable, as I need to sum not 3-element lists, but rather 10^5-element list of 10^4-element-long vectors.

UPD: Reduce turned out to be the fastest method, after all...

lst <- list(1:10000, 10001:20000, 20001:30000)
lst2 <- lst[rep(seq.int(length(lst)), 1000)]
microbenchmark::microbenchmark(colSums(do.call(rbind, lst2)),
                            vapply(transpose(lst2), sum, 0),
                            Reduce(f = "+", x = lst2))

    Unit: milliseconds
                           expr      min       lq     mean   median       uq       max neval cld
   colSums(do.call(rbind, lst2)) 153.5086 194.9139 222.7954 198.1952 201.8152  915.6354   100  b 
 vapply(transpose(lst2), sum, 0) 398.9424 537.3834 732.4747 781.7255 813.7376 1538.4301   100   c
       Reduce(f = "+", x = lst2) 101.5618 105.5864 139.8651 108.1204 112.7861 2567.1793   100 a  
Emile Zäkiev
  • 150
  • 1
  • 12
  • 3
    nothing to do with `do.call` -- `+` only works like `+x` or `x+y` – MichaelChirico Aug 05 '20 at 11:57
  • 5
    if Reduce doesn't work for you, what's wrong with using a for loop for this case? unless your input is already in a matrix in which case colSums/rowSums is what you want – MichaelChirico Aug 05 '20 at 11:58
  • if you want to build a call it'll have to be in "Polish" form like `+(x1, +(x2, ..., +(x[n-1], xn)...))`. doable but a mess; for loop should have the same performance – MichaelChirico Aug 05 '20 at 12:01
  • my concern with loop was its (alleged) poor performance as compared to the classical apply-styles of functions – Emile Zäkiev Aug 05 '20 at 12:21
  • 1
    For the sake of completeness, I would do as MichaelChirico suggested and benchmark a loop. It'll be more efficient than you expect (if done correctly). – Ritchie Sacramento Aug 05 '20 at 13:28
  • 1
    @27ϕ9 I did this out of curiosity. Same benchmark as before (with vector length = 10000). My `for` loop is virtually identical in time to `Reduce`, both of which are faster than the other two methods. –  Aug 05 '20 at 15:13
  • questions of efficiency mean looking at the whole workflow. of your input is already as a list of inputs, actually I think do.call(psum, inputs) would be best. apply works best on matrices, etc – MichaelChirico Aug 05 '20 at 15:17
  • ah, I forgot psum is something I wrote keep an eye on this pull request, eventually data.table could handle your case directly: https://github.com/Rdatatable/data.table/pull/4448 – MichaelChirico Aug 05 '20 at 15:21

3 Answers3

5

You could use :

colSums(do.call(rbind, lst))
#[1] 10  3  6

Or similarly :

rowSums(do.call(cbind, lst))

where lst is :

lst <- list(c(0,0,1), c(1,2,3), c(9, 1, 2))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thanks, I was also considering something like. I wonder about the overhead generated by rbind'ing or cbind'ing the vectors into a matrix first. I'll get back with some tests. – Emile Zäkiev Aug 05 '20 at 12:13
5

As your list gets larger, you might find that this starts to become fast:

# careful if you use the tidyverse that purrr does not mask transpose
library(data.table) 

lst <- list(c(0,0,1), c(1,2,3), c(9, 1, 2))

vapply(transpose(lst), sum, 0)
# [1] 10  3  6

I have taken a few answers to compare speed, which seems to be what you want.

# make the list a bit bigger...
lst2 <- lst[rep(seq.int(length(lst)), 1000)]

microbenchmark::microbenchmark(Reduce(`+`, lst2),
                               colSums(do.call(rbind, lst2)),
                               vapply(transpose(lst2), sum, 0),
                               eval(str2lang(paste0(lst2,collapse = "+"))))
)

Unit: microseconds
                                         expr     min       lq      mean   median       uq     max neval
                            Reduce(`+`, lst2)   954.9  1088.10  1341.271  1191.05  1389.00  6923.2   100
                colSums(do.call(rbind, lst2))   402.2   474.80   761.473   538.85   843.75  7079.7   100
              vapply(transpose(lst2), sum, 0)    81.9    91.85   110.455   103.90   119.00   330.4   100
 eval(str2lang(paste0(lst2, collapse = "+"))) 17489.2 18466.65 20767.888 19572.25 20809.80 57770.4   100

Here it is though with longer vectors, as is your use case. This benchmark will take a minute or two to run. Notice the unit is now in milliseconds. I think it will depend on how long the list is.

lst <- list(1:10000, 10001:20000, 20001:30000)
lst2 <- lst[rep(seq.int(length(lst)), 1000)]

microbenchmark::microbenchmark(colSums(do.call(rbind, lst2)),
                               vapply(transpose(lst2), sum, 0))
)

Unit: milliseconds
                            expr      min       lq     mean   median       uq      max neval
   colSums(do.call(rbind, lst2)) 141.7147 146.6305 188.5108 163.4915 228.7852 270.5679   100
 vapply(transpose(lst2), sum, 0) 261.8630 335.6093 348.6241 341.6958 348.6404 495.0994   100
  • Thanks for running these benchmarks, I'll go with the vapply solution, surely! But why is vapply so much faster anyway?! Would be interesting to find out... – Emile Zäkiev Aug 05 '20 at 12:44
  • 2
    I think you need to consider the length of the vectors inside the list as this will have a dramatic impact on the benchmarks. OP can clarify but maybe these vectors are length 3, maybe they're 5000 or whatever. – Ritchie Sacramento Aug 05 '20 at 12:44
  • they're length 10k, thanks for asking, added it to the main post – Emile Zäkiev Aug 05 '20 at 12:45
  • Actually, it might not be. Just reran with longer vectors, about to update. –  Aug 05 '20 at 12:51
0

Another base R workaround

rowSums(as.data.frame(lst)

or

eval(str2lang(paste0(lst,collapse = "+")))

which gives

[1] 10  3  6

Data

lst <- list(c(0,0,1), c(1,2,3), c(9, 1, 2))
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81