0

I would like to process a list of lists. Specifically I want to extract the dataframe that is the third member of each list by a grouping variable (the first member of each list) and then use several functions like mean(), median(), sd(), length() etc on the data in that group. The output is then returned in a dataframe and would look something like:

Grp   mean sd  ... 
 a    5.26 ... ...
 b    6.25 ... ...

#fake data
test<-list(
         #member 1=grouping var, 2=identity, 3=dataframe
         list("a", 54, data.frame(x=c(1,2)  ,y=c(3,4))),
         list("b", 55, data.frame(x=c(5,6)  ,y=c(7,8))),
         list("a", 56, data.frame(x=c(9 ,10),y=c(11,12))),
         list("b", 57, data.frame(x=c(13,14),y=c(15,NA)))
         )

#what I thought could work but kicks out a strange error

test2 <-ldply(test, .fun=unlist)
#note limited to just mean for now
tapply(test, factor(test$V1), FUN=function(x){mean(as.numeric(x[3:6]), na.rm=TRUE)}, simplify=TRUE)

So my questions are: 1. Why doesn't the above work? 2. This feels very clunky, is there a more efficient way to do this?

TBP
  • 697
  • 6
  • 16
  • What are your desired results? – alistaire Nov 18 '16 at 22:27
  • 1
    What you're trying to accomplish is somewhat unclear, but maybe something like `library(tidyverse) ; test %>% map_df(~mutate(.x[[3]], grp = .x[[1]])) %>% group_by(grp) %>% summarise_all(mean, na.rm = TRUE)` – alistaire Nov 18 '16 at 22:33
  • edited to address your question re output. – TBP Nov 18 '16 at 22:35
  • So are you lumping `x` and `y` values together when taking `mean`/`sd`/etc.? – alistaire Nov 18 '16 at 22:38
  • yes, they are lumped – TBP Nov 18 '16 at 22:57
  • I'm not asking if they're the same; I'm asking if you're taking the mean of `x` and the mean of `y` or the mean of `x` and `y`. If the former, see above. If the latter, `test %>% map_df(~mutate(.x[[3]], grp = .x[[1]])) %>% gather(var, val, x, y) %>% group_by(grp) %>% summarise_at('val', funs(mean, sd), na.rm = TRUE)` or `test %>% map_df(~data.frame(val = unlist(.x[[3]]), grp = .x[[1]])) %>% group_by(grp) %>% summarise_all(funs(mean, sd), na.rm = TRUE)` – alistaire Nov 18 '16 at 23:00

1 Answers1

3

In base R you can do :

df_list <- tapply(test, 
                  sapply(test, `[[`,1), 
                  FUN=function(x) do.call(rbind,lapply(x, `[[`,3)))
t(sapply(df_list, function(x){
  list("mean"=mean(unlist(x), na.rm = T),
       "sd"=sd(unlist(x), na.rm = T),
       "median"=median(unlist(x), na.rm = T))}))

  mean     sd       median
a 6.5      4.440077 6.5   
b 9.714286 4.151879 8   
HubertL
  • 19,246
  • 3
  • 32
  • 51