Processing lists of lists by group

Question

I would like to process a list of lists. Specifically I want to extract the dataframe that is the third member of each list by a grouping variable (the first member of each list) and then use several functions like mean(), median(), sd(), length() etc on the data in that group. The output is then returned in a dataframe and would look something like:

Grp   mean sd  ... 
 a    5.26 ... ...
 b    6.25 ... ...

#fake data
test<-list(
         #member 1=grouping var, 2=identity, 3=dataframe
         list("a", 54, data.frame(x=c(1,2)  ,y=c(3,4))),
         list("b", 55, data.frame(x=c(5,6)  ,y=c(7,8))),
         list("a", 56, data.frame(x=c(9 ,10),y=c(11,12))),
         list("b", 57, data.frame(x=c(13,14),y=c(15,NA)))
         )

#what I thought could work but kicks out a strange error

test2 <-ldply(test, .fun=unlist)
#note limited to just mean for now
tapply(test, factor(test$V1), FUN=function(x){mean(as.numeric(x[3:6]), na.rm=TRUE)}, simplify=TRUE)

So my questions are: 1. Why doesn't the above work? 2. This feels very clunky, is there a more efficient way to do this?

What you're trying to accomplish is somewhat unclear, but maybe something like `library(tidyverse) ; test %>% map_df(~mutate(.x[[3]], grp = .x[[1]])) %>% group_by(grp) %>% summarise_all(mean, na.rm = TRUE)` — alistaire, Nov 18 '16 at 22:33
So are you lumping `x` and `y` values together when taking `mean`/`sd`/etc.? — alistaire, Nov 18 '16 at 22:38
I'm not asking if they're the same; I'm asking if you're taking the mean of `x` and the mean of `y` or the mean of `x` and `y`. If the former, see above. If the latter, `test %>% map_df(~mutate(.x[[3]], grp = .x[[1]])) %>% gather(var, val, x, y) %>% group_by(grp) %>% summarise_at('val', funs(mean, sd), na.rm = TRUE)` or `test %>% map_df(~data.frame(val = unlist(.x[[3]]), grp = .x[[1]])) %>% group_by(grp) %>% summarise_all(funs(mean, sd), na.rm = TRUE)` — alistaire, Nov 18 '16 at 23:00

score 3 · Accepted Answer · answered Nov 18 '16 at 22:49

3

In base R you can do :

df_list <- tapply(test, 
                  sapply(test, `[[`,1), 
                  FUN=function(x) do.call(rbind,lapply(x, `[[`,3)))
t(sapply(df_list, function(x){
  list("mean"=mean(unlist(x), na.rm = T),
       "sd"=sd(unlist(x), na.rm = T),
       "median"=median(unlist(x), na.rm = T))}))

  mean     sd       median
a 6.5      4.440077 6.5   
b 9.714286 4.151879 8

answered Nov 18 '16 at 22:49

HubertL

19,246
3
32
51

that would do it. Thank you! – TBP Nov 18 '16 at 22:56

Processing lists of lists by group

1 Answers1