Mean of elements in a list of data.frames

Question

Suppose I had a list of data.frames (of equal rows and columns)

dat1 <- as.data.frame(matrix(rnorm(25), ncol=5))
dat2 <- as.data.frame(matrix(rnorm(25), ncol=5))
dat3 <- as.data.frame(matrix(rnorm(25), ncol=5))

all.dat <- list(dat1=dat1, dat2=dat2, dat3=dat3)

How can I return a single data.frame that is the mean (or sum, etc.) for each element in the data.frames across the list (e.g., mean of first row and first column from lists 1, 2, 3 and so on)? I have tried lapply and ldply in plyr but these return the statistic for each data.frame within the list.

Edit: For some reason, this was retagged as homework. Not that it matters either way, but this is not a homework question. I just don't know why I can't get this to work. Thanks for any insight!

Edit2: For further clarification: I can get the results using loops, but I was hoping that there were a way (a simpler and faster way because the data I am using has data.frames that are 12 rows by 100 columns and there is a list of 1000+ of these data frames).

z <- matrix(0, nrow(all.dat$dat1), ncol(all.dat$dat1))

for(l in 1:nrow(all.dat$dat1)){
   for(m in 1:ncol(all.dat$dat1)){
      z[l, m] <- mean(unlist(lapply(all.dat, `[`, i =l, j = m)))
   }
}

With a result of the means:

> z
        [,1]        [,2]        [,3]        [,4]       [,5]
[1,] -0.64185488  0.06220447 -0.02153806  0.83567173  0.3978507
[2,] -0.27953054 -0.19567085  0.45718399 -0.02823715  0.4932950
[3,]  0.40506666  0.95157856  1.00017954  0.57434125 -0.5969884
[4,]  0.71972821 -0.29190645  0.16257478 -0.08897047  0.9703909
[5,] -0.05570302  0.62045662  0.93427522 -0.55295824  0.7064439

I was wondering if there was a less clunky and faster way to do this. Thanks!

Those aren't means. Those are medians. – Brandon Bertelsen Oct 04 '11 at 18:01 — Brandon Bertelsen, Oct 04 '11 at 18:01

score 19 · Answer 1 · answered Oct 05 '11 at 01:52

19

Here is a one liner with plyr. You can replace mean with any other function that you want.

ans1 = aaply(laply(all.dat, as.matrix), c(2, 3), mean)

answered Oct 05 '11 at 01:52

Ramnath

54,439
16
125
152

3

why c(2,3)? what does that mean? – nafrtiti Apr 23 '18 at 11:32
1

It is a way to access an array ... it basically , transforms the data to a 3-dimensional array and then takes a column mean out of it... elegant ...kudos – Mario Fajardo May 28 '18 at 05:38

score 12 · Answer 2 · answered Oct 04 '11 at 17:30

You would have an easier time changing the data structure, combining the three two dimensional matrices into a single 3 dimensional array (using the abind library). Then the solution is more direct using apply and specifying the dimensions to average over.

EDIT:

When I answered the question, it was tagged homework, so I just gave an approach. The original poster removed that tag, so I will take him/her at his/her word that it isn't.

library("abind")

all.matrix <- abind(all.dat, along=3)
apply(all.matrix, c(1,2), mean)

I was not aware of abind, I will look into it. Thanks! – ChrisC Oct 04 '11 at 17:50 — ChrisC, Oct 04 '11 at 17:50

score 12 · Answer 3 · answered Oct 04 '11 at 17:58

12

I gave one answer that uses a completely different data structure to achieve the result. This answer uses the data structure (list of data frames) given directly. I think it is less elegant, but wanted to provide it anyway.

Reduce(`+`, all.dat) / length(all.dat)

The logic is to add the data frames together element by element (which + will do with data frames), then divide by the number of data frames. Using Reduce is necessary since + can only take two arguments at a time (and addition is associative).

answered Oct 04 '11 at 17:58

Brian Diggs

57,757
13
166
188

This was actually a strategy I initially tried but this only works if I was trying to get means or sums, but I also wanted to have the option of finding the median. I think changing the data structure is likely my best option. – ChrisC Oct 04 '11 at 18:09
I can't think of how to adapt this to median; median needs all the elements at once while mean can be built up two at a time. – Brian Diggs Oct 04 '11 at 18:46
This answer is better than http://stackoverflow.com/a/7651775/4907 when the list of data.frame's is very long. – Michael Schneider Jun 16 '15 at 12:52
This is the cleanest solution, however it fails when there is a character column (for instance a key that is the same in each list). – jzadra Jan 22 '19 at 19:26
1

@jzadra True, but the "mean" of a vector of character strings is not well defined anyway. In where they would just be labels, the `data.frame` could be subset to remove them and then add a set back in afterward. – Brian Diggs Jan 22 '19 at 22:02

score 7 · Answer 4 · answered Oct 05 '11 at 08:56

Another approach using only base functions to change the structure of the object:

listVec <- lapply(all.dat, c, recursive=TRUE)
m <- do.call(cbind, listVec)

Now you can calculate the mean with rowMeans or the median with apply:

means <- rowMeans(m)
medians <- apply(m, 1, median)

score 3 · Answer 5 · answered Oct 04 '11 at 23:11

3

I would take a slightly different approach:

library(plyr)
tmp <- ldply(all.dat) # convert to df
tmp$counter <- 1:5 # 1:12 for your actual situation
ddply(tmp, .(counter), function(x) colMeans(x[2:ncol(x)]))

answered Oct 04 '11 at 23:11

Brandon Bertelsen

43,807
34
160
255

score 1 · Answer 6 · answered Oct 04 '11 at 19:43

1

Couldn't you just use nested lapply() calls?

This appears to give the correct result on my machine

mean.dat <- lapply(all.dat, function (x) lapply(x, mean, na.rm=TRUE))

answered Oct 04 '11 at 19:43

richiemorrisroe

9,307
3
22
20

4

With this code you get the mean of the columns of each data.frame. You obtain the same result with `lapply(all.dat, colMeans)`. – Oscar Perpiñán Oct 05 '11 at 09:01

Mean of elements in a list of data.frames

6 Answers6

Linked