repeatedly applying ave for computing group means in a data frame

Question

The following code separately produces the group means of x and y in accordance to group. Suppose that I have a number of variables for which repeating the same operation.

How would you suggest to proceed in order to obtain the same result through a single command? (I suppose it is necessary to adopt tapply, but I am not really sure about it..).

x=seq(1,11,by=2); y=seq(2,12,by=2); group=rep(1:2, each=3)
dat <- data.frame(cbind(group, x, y))

dat$m_x <- ave(dat$x, dat$group)
dat$m_y <- ave(dat$y, dat$group)
dat

Many thanks.

Do you have more than two columns to apply this to (more than just x and y)? (If you have only two, yours would seem to be the simplest solution- you hardly need to condense it into one command). — David Robinson, Jan 03 '13 at 19:07
Yes, I have a number of columns. sorry, I edited the initial question — Stefano Lombardi, Jan 03 '13 at 19:09
See my answer below, which creates a new matrix where each column has been modified using `ave`. If you need it in a data frame (or need to put it in the original data frame), that's simple enough to modify. — David Robinson, Jan 03 '13 at 19:14
`sapply(dat,ave,dat$group)`. However, I recommend `plyr::ddply` or `data.table` for this. — Roland, Jan 03 '13 at 19:15

Arun · Accepted Answer · 2013-01-04T07:01:23.197

Alternative solutions using data.table and plyr packages:

1) Using data.table

require(data.table)
dt <- data.table(dat, key="group")
# Following @Matthew's comment, edited:
dt[, `:=`(m_x = mean(x), m_y = mean(y)), by=group]

Output:

   group  x  y m_x m_y
1:     1  1  2   3   4
2:     1  3  4   3   4
3:     1  5  6   3   4
4:     2  7  8   9  10
5:     2  9 10   9  10
6:     2 11 12   9  10

2) using plyr and transform:

require(plyr)
ddply(dat, .(group), transform, m_x=mean(x), m_y=mean(y))

output:

  group  x  y m_x m_y
1     1  1  2   3   4
2     1  3  4   3   4
3     1  5  6   3   4
4     2  7  8   9  10
5     2  9 10   9  10
6     2 11 12   9  10

3) using plyr and numcolwise (note the reduced output):

ddply(dat, .(group), numcolwise(mean))

Output:

  group x  y
1     1 3  4
2     2 9 10

+1 too. Btw, you can add both columns by reference in the same grouping step, to save grouping twice: `dt[, \`:=\`(m_x=mean(x), m_y=mean(y)), by=group]`. — Matt Dowle, Jan 03 '13 at 23:39

score 3 · Answer 2 · answered Jan 03 '13 at 19:12

3

Assuming you have more than just two columns, you would want to use apply to apply ave to every column in the matrix.

x=seq(1,11,by=2); y=seq(2,12,by=2); group=rep(1:2, each=3)
dat <- cbind(x, y)

ave.dat <- apply(dat, 2, function(column) ave(column, group))
#       x  y
# [1,]  1  2
# [2,]  3  4
# [3,]  5  6
# [4,]  7  8
# [5,]  9 10
# [6,] 11 12

answered Jan 03 '13 at 19:12

David Robinson

77,383
16
167
187

Thanks! is it possible to directly obtain a data frame as the final result? – Stefano Lombardi Jan 03 '13 at 19:19
Could just do `as.data.frame(ave.dat)` – David Robinson Jan 03 '13 at 19:54

score 1 · Answer 3 · answered Jan 05 '13 at 22:09

You can also use aggregate():

dat2 <- data.frame(dat, aggregate(dat[,-1], by=list(dat$group), mean)[group, -1])
dat2
    group  x  y x.1 y.1
1       1  1  2   3   4
1.1     1  3  4   3   4
1.2     1  5  6   3   4
2       2  7  8   9  10
2.1     2  9 10   9  10
2.2     2 11 12   9  10
row.names(dat2) <- rownames(dat)
colnames(dat2) <- gsub("(.)\\.1", "m_\\1", colnames(dat2))
dat2
  group  x  y m_x m_y
1     1  1  2   3   4
2     1  3  4   3   4
3     1  5  6   3   4
4     2  7  8   9  10
5     2  9 10   9  10
6     2 11 12   9  10

If the variable names are more than a single character, you would need to modify the gsub() call.

repeatedly applying ave for computing group means in a data frame

3 Answers3

Linked