2

like the question , I want to assign the column names of which were processed by lapply and tapply in R. A simple example:

df<-data.frame('X1'=rnorm(100),
               'X2'=rnorm(100),
               'X3'=c(c(rep('A',50)),c(rep('B',50))))


var<-c('X1','X2')
plyr::ldply(lapply(var, function(v) {
  tapply(df[,v],df$X3,mean)
}),rbind)

which will result as :

            A          B
1 -0.06856352 0.08608197
2 -0.23585510 0.01551267

from which I could not tell whether row 1 is from 'X1' or 'X2'. What I want is :

            A          B
X1 -0.06856352 0.08608197
X2 -0.23585510 0.01551267

although we could do a simply manual check in this example and a bold guess that row 1 is from 'X1', however this will become tedious and risky when there are lots more variables and function much more complex than mean.

Anyone know how to achieve this? your time and knowledge would be deeply appreciated. Thanks in advance.

Sotos
  • 51,121
  • 6
  • 32
  • 66
Jia Gao
  • 1,172
  • 3
  • 13
  • 26
  • 1
    Why not just `aggregate(. ~ X3, df, mean)` or `t(aggregate(. ~ X3, df, mean)[,-1])`? – Sotos Aug 13 '17 at 11:33
  • 1
    Or `group_by(df, X3) %>% summarise_each(funs(mean))` – coffeinjunky Aug 13 '17 at 11:37
  • Thanks Sotos and coffeinjunky, both of your comments make me one step closer to the solution I want. The aggregate ,group_by, summarise_each functions provide me new tools in dealing with problems like this, I'm too much addicted to apply functions. – Jia Gao Aug 13 '17 at 12:58

2 Answers2

1

Just to flesh out my comment: many people like to do Split-Apply-Combine operations with dplyr. See e.g. the following:

library(dplyr)

set.seed(1)
df<-data.frame('X1'=rnorm(100),
               'X2'=rnorm(100),
               'X3'=c(c(rep('A',50)),c(rep('B',50))))

var<-c('X1','X2')

out <- df %>% group_by(X3) %>% select_(.dots = var) %>%  summarise_each(funs(mean))
out

# A tibble: 2 × 3
      X3        X1          X2
  <fctr>     <dbl>       <dbl>
1      A 0.1004483 -0.15248544
2      B 0.1173265  0.07686929

If you want to have more functions applied, or more complicated functions applied, it works the same way. For instance, to apply two functions:

df %>% group_by(X3) %>% select_(.dots = var) %>%  summarise_each(funs(mean, sd))

# A tibble: 2 × 5
      X3   X1_mean     X2_mean     X1_sd     X2_sd
  <fctr>     <dbl>       <dbl>     <dbl>     <dbl>
1      A 0.1004483 -0.15248544 0.8313939 0.8997394
2      B 0.1173265  0.07686929 0.9688279 1.0086725

You can then easily transpose the outcome if you really wish to do so.

transposed <- t(out[,-1])
colnames(transposed) <- t(out[,1])
transposed
            A          B
X1  0.1004483 0.11732645
X2 -0.1524854 0.07686929
coffeinjunky
  • 11,254
  • 39
  • 57
  • Thanks for your reply@coffeinjunky, your code solve my problem perfectly, and the detailed illustrations also are very helpful, especially the multiple functions you mentioned. Unfortunately, akrun's answer above works also and he provided it a little bit earlier, therefore I took his as answer. I'll mark yours as useful though I want to make it an answer too, but, you know, stackoverflow wouldn't allows me to do that. – Jia Gao Aug 13 '17 at 13:04
1

We can also use summarise_at with column_to_rownames

library(tidyverse)
df %>% 
   group_by(X3) %>% 
   summarise_at(vars(var), mean) %>% 
   as.data.frame() %>%
   column_to_rownames("X3") %>%
   t
#           A         B
#X1 -0.1720188 0.1834966
#X2  0.1413389 0.1138864
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Thanks for your reply @akrun, you code looks very tidy and elegant and solves my problem perfectly. I'll take it as an answer. – Jia Gao Aug 13 '17 at 13:01