2

Sorry if this will seem trivial, but after searching the internet for some time I couldn't come upon a solution.

I have a matrix and a factor vector associated with columns. The goal is to get rowMeans for all factors separately and maintain the original matrix structure. So probably it would be something like ave() but working on 2 dimensional arrays.

Here is a crude demonstration:

(mat <- rbind(1:5,6:10,11:15))
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    2    3    4    5
[2,]    6    7    8    9   10
[3,]   11   12   13   14   15

groups <- c(1,1,1,2,2)

mat[,groups==1] <- rowMeans(mat[,groups==1]) # I am asking about this part
mat[,groups==2] <- rowMeans(mat[,groups==2]) # ...

mat
     [,1] [,2] [,3] [,4] [,5]
[1,]    2    2    2  4.5  4.5
[2,]    7    7    7  9.5  9.5
[3,]   12   12   12 14.5 14.5

In practice this matrix would have millions of rows (and less columns). So solutions that work row-by-row might be too slow.

I am on the way to writing my own function, but this seems like something that might have an easy one-line solution.

Karolis Koncevičius
  • 9,417
  • 9
  • 56
  • 89

3 Answers3

3

1) Assuming that you want to replace every element of each row with the mean of that row, try this where m is your matrix:

ave(m, row(m))

If that is not what you want please provide a complete example including input and desried output.

2) For the updated question try this:

t(ave(t(m), group, t(row(m))))

or this equivalent variation:

ave(m, matrix(group, nrow(m), ncol(m), byrow = TRUE), row(m))
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
2

Maybe like this:

mat.list  <- Map(matrix, split(mat, groups[col(mat)]), nrow = nrow(mat))
mean.list <- Map(rowMeans, mat.list)
do.call(cbind, mean.list[groups])

Or for greater speed:

idx.list  <- split(seq_len(ncol(mat)), groups)
get.cols  <- function(mat, idx) mat[, idx, drop = FALSE]
mat.list  <- lapply(idx.list, get.cols, mat = mat)
mean.list <- lapply(mat.list, rowMeans)
do.call(cbind, mean.list[groups])
flodel
  • 87,577
  • 21
  • 185
  • 223
  • I gladly accept your answer. It's correct and gave me some new ideas. However I feel like for loop would be easier to understand in this case :) Really thought I was missing some basic one-word function. – Karolis Koncevičius Oct 15 '14 at 01:56
1

It would be nice if there was an optimized function for this, something like rowGroupMeans, but I'm not aware of such a thing.

My solution is to use rowsum, as follows:

means <- rowsum(t(mat), groups)/tabulate(groups)
t(means)[, groups]

      1  1  1    2    2
[1,]  2  2  2  4.5  4.5
[2,]  7  7  7  9.5  9.5
[3,] 12 12 12 14.5 14.5

This scales quite well to bigger problems, e.g.

mat <- matrix(1:100e6, ncol = 100)
groups <- rep(1:10, each = 10)

## Map solution
for (i in 1:3){
    print(system.time({
        mat.list  <- Map(matrix, split(mat, groups[col(mat)]), nrow = nrow(mat))
        mean.list <- Map(rowMeans, mat.list)
        ans1 <- do.call(cbind, mean.list[groups])
    }))
}

   user  system elapsed 
   8.20    1.26    9.66 
   user  system elapsed 
  11.84    1.94   13.90 
   user  system elapsed 
  10.70    1.89   12.79

## rowsum solution
for (i in 1:3){
    print(system.time({
        means <- rowsum(t(mat), groups)/tabulate(groups)
        ans2 <- t(means)[,groups]
    }))
}

   user  system elapsed 
   1.56    0.22    1.78 
   user  system elapsed 
   1.48    0.27    1.74 
   user  system elapsed 
   1.57    0.14    1.72

As already noted the ave solution does not scale well - my R session crashed when I tried to run timings for this.

Heather Turner
  • 3,264
  • 23
  • 30
  • Thank you for the response. Your solution the most elegant for rowMeans and really fast. However I used "rowMeans" as an example (maybe that was not clear) and was aiming for a full ave()-type approach. That is I want to be able to specify other types of functions in the end (like rowMedians). Probably I will write some kind of wrapper for this myself. All the answers here gave me some ideas. Thank you again. – Karolis Koncevičius Oct 15 '14 at 13:51