How can I force lapply to bind the output as apply does for variables with identical values?

Question

apply(mtcars[,c('vs','am')],2,table)

produces

  vs am
0 18 19
1 14 13

but

lapply(mtcars[,c('vs','am')],table)

produces

Can I force lapply to do produce one table as apply does?

In the end I would like to compute means using different dependent variables with identical values. I use lapply but do not want to have to perform a cbind in the end:

func.break <- function(indy){
  t(as.data.frame(mtcars 
                      %>% group_by(get(indy)) 
                      %>% summarise_at(depy, funs(mean))
                      )
   )
}

indy <- c('vs','am') 
depy <- c('mpg','qsec')

res.list <- lapply(indy,func.break)
res.list

[[1]]
              [,1]     [,2]
get(indy)  0.00000  1.00000
mpg       16.61667 24.55714
qsec      16.69389 19.33357

[[2]]
              [,1]     [,2]
get(indy)  0.00000  1.00000
mpg       17.14737 24.39231
qsec      18.18316 17.36000

cbind(as.data.frame(res.list[1]),as.data.frame(res.list[1]))

                X1       X2       X1       X2
get(indy)  0.00000  1.00000  0.00000  1.00000
mpg       16.61667 24.55714 16.61667 24.55714
qsec      16.69389 19.33357 16.69389 19.33357

I guess there is a more elegant way? How would apply work for this?

Does this help? `lapply(mtcars[,c('vs','am')],table) %>% do.call(rbind, .) %>% t()` — mhovd, Feb 26 '21 at 12:18
@mhovd: That is horrible slow to do. R has `sapply`: `sapply(mtcars[,c('vs','am')],table)` — Julian_Hn, Feb 26 '21 at 12:24
Thanks. But how can I adapt this for the above mentioned func.break (bivariate) instead of table (univariate)? Neither lapply(mtcars[,c(indy,depy)],func.break) %>% do.call(rbind, .) %>% t() nor sapply(mtcars[,c(indy,depy)],func.break) do work? — Luitpold Wienerle, Feb 26 '21 at 13:16
@Julian_Hn You might be interested to see the timings in my answer below. — jay.sf, Feb 26 '21 at 13:46
@LuitpoldWienerle you can use single backticks for inline code in comments. — Gregor Thomas, Feb 26 '21 at 14:01

jay.sf · Answer 1 · 2021-02-26T15:22:49.900

2

Using aggregate.

FUN <- function(x) t(with(mtcars, aggregate(mtcars[, depy], list(y=get(x)), mean)))
do.call(cbind.data.frame, lapply(indy, FUN))
#             1        2        1        2
# y     0.00000  1.00000  0.00000  1.00000
# mpg  16.61667 24.55714 17.14737 24.39231
# qsec 16.69389 19.33357 18.18316 17.36000

Timings

(See discussion in comments.)

set.seed(42)
mtcars <- mtcars[sample(nrow(mtcars), 5e5, replace=TRUE), ]
library(magrittr)

microbenchmark::microbenchmark(
  list2DF=list2DF(lapply(mtcars[, c("vs", "am")], table))
  apply=apply(mtcars[,c('vs','am')], 2, table),
  rbind.lapply=t(do.call(rbind, lapply(mtcars[,c('vs','am')], table))),
  sapply=sapply(mtcars[,c('vs','am')],table),
  pipe=lapply(mtcars[,c('vs','am')],table) %>% do.call(rbind, .) %>% t(),
  times=100L)
# Unit: seconds
#         expr      min       lq     mean   median       uq      max neval cld
#      list2DF 1.160465 1.170537 1.196288 1.202672 1.211612 1.278879   100  a 
#        apply 1.221822 1.264967 1.279215 1.270300 1.293056 1.391812   100   b
# rbind.lapply 1.163678 1.172187 1.198325 1.204445 1.214805 1.290071   100  a 
#       sapply 1.168295 1.174507 1.199146 1.207050 1.213422 1.315810   100  a 
#         pipe 1.167020 1.173511 1.203780 1.207331 1.215454 1.519427   100  a

edited Feb 26 '21 at 15:22

answered Feb 26 '21 at 13:43

jay.sf

60,139
8
53
110

Excepting `apply` which does an extra class conversion, the median times range from `1.219` all the way up to `1.226` - a range of `0.007` seconds. It's almost as if which iteration method you choose doesn't matter ;) (adding some discussion in comments, since I didn't see it) – Gregor Thomas Feb 26 '21 at 14:03
1

@GregorThomas More spread in `sapply` though, probably because it's internally calling `lapply` (and not `.Internal(lapply())`). I was surprised that the pipe was so fast. – jay.sf Feb 26 '21 at 14:10
Good point on `sapply`'s spread, but still not enough to matter in 99.9% of use cases. – Gregor Thomas Feb 26 '21 at 14:19
1

@jay.sf Thanks for the benchmark. Very interesting. I'm pretty sure that at least for bigger data sets the `rbind` approach used to be really slow. Either something changed or I was just plain wrong. Although I still find the solution horrible from a readability standpoint, if time is not deeply critical. – Julian_Hn Feb 26 '21 at 15:12
thanks @jay.sf for suggesting ´aggregate´, but I see that it also works with my original dplyr function: ´do.call(cbind.data.frame, lapply(indy, func.break))´. But the do.call plus cbind thing is great for me! – Luitpold Wienerle Feb 26 '21 at 15:42

ThomasIsCoding · Accepted Answer · 2021-02-26T15:59:12.780

2

Try list2DF

list2DF(lapply(mtcars[, c("vs", "am")], table))

which gives

  vs am
1 18 19
2 14 13

Update

Or you can try as.data.frame

> as.data.frame(lapply(indy, func.break))
                X1       X2     X1.1     X2.1
get(indy)  0.00000  1.00000  0.00000  1.00000
mpg       16.61667 24.55714 17.14737 24.39231
qsec      16.69389 19.33357 18.18316 17.36000

edited Feb 26 '21 at 15:59

answered Feb 26 '21 at 13:50

ThomasIsCoding

96,636
9
24
81

Nice, included in timing. – jay.sf Feb 26 '21 at 13:54
interesting, but combines the list elements vertically (like 'rbind'), whereas I need them horizontally, 'list2df(lapply(indy, func.break))' like with 'cbind' – Luitpold Wienerle Feb 26 '21 at 15:38
I cannot believe how easy this is. Thanks. – Luitpold Wienerle Mar 01 '21 at 10:25

score 0 · Answer 3 · answered Feb 26 '21 at 13:31

I would suggest an alternative approach to func.break.

library(dplyr)

purrr::map_dfc(indy, ~mtcars %>% 
              group_by(.data[[.x]]) %>% 
              summarise(across(all_of(depy), mean, .names = '{.x}_{col}')))

#    vs vs_mpg vs_qsec    am am_mpg am_qsec
#  <dbl>  <dbl>   <dbl> <dbl>  <dbl>   <dbl>
#1     0   16.6    16.7     0   17.1    18.2
#2     1   24.6    19.3     1   24.4    17.4

How can I force lapply to bind the output as apply does for variables with identical values?

3 Answers3

Timings