2

Suppose I have this data.frame in R:

ages <- data.frame(Indiv = numeric(),
    Age = numeric(),
    W = numeric())
ages[1,] <- c(1,10,2)
ages[2,] <- c(1,15,5)
ages[3,] <- c(2,5,1)
ages[4,] <- c(2,100,2)

ages

  Indiv Age W
1     1  10 2
2     1  15 5
3     2   5 1
4     2 100 2

If I do:

meanAge <- aggregate(ages$Age,list(ages$Indiv),mean)

I get the mean Age (x) for each Indiv (Group.1):

  Group.1    x
1       1 12.5
2       2 52.5

But I want to calculate the weighted arithmetic mean of Age (weight being W). If I do:

WmeanAge <- aggregate(ages$Age,list(ages$Indiv),weighted.mean,ages$W)

I get:

Error in weighted.mean.default(X[[1L]], ...) : 
  'x' and 'w' must have the same length

I think I should have:

  Group.1           x
1       1 13.57142857
2       2 68.33333333

What am I doing wrong? Thanks in advance!

Rodrigo
  • 4,706
  • 6
  • 51
  • 94

4 Answers4

11

Doh, you beat me to it. But anyway, here is my answer using both plyr and dplyr:

ages = data.frame(Indiv = c(1,1,2,2),
              Age = c(10,15,5,100),
              W = c(2,5,1,2))

library(plyr)
ddply(ages, .(Indiv), summarize, 
      mean = mean(Age),
      wmean = weighted.mean(Age, w=W))


library(dplyr)
ages %.% 
  group_by(Indiv) %.% 
  summarise(mean = mean(Age), wmean = weighted.mean(Age, W))
jaradniemi
  • 618
  • 4
  • 15
2

The problem is that aggregate does not split up the w arguments – so weighted.mean is receiving subsets of ages$Age, but it is not receiving the equivalent subsets of ages$W.

Try the plyr package!! It's great. I use it in 95% of the scripts that I write.

library("plyr")

# the plyr package has functions that come in the format of  _ _ ply
# the first blank is the input format, and the second is the output format
# d = data.frame, l = list, a = array, etc.
# thus, with ddply(), you supply a data.frame (ages), and it returns a data.frame (WmeanAge)

# .data is your data set
# .variables is the name of the column (or columns!) to be used to split .data
# .fun is the function you want to apply to each subset of .data

new.weighted.mean <- function(x, ...){
   weighted.mean(x=x[,"Age"], w=x[,"W"], ...)
}

WmeanAge <- ddply(.data=ages, .variables="Indiv", .fun=new.weighted.mean, na.rm=TRUE)
print(WmeanAge)
rbatt
  • 4,677
  • 4
  • 23
  • 41
  • I have seen this package here: http://stackoverflow.com/a/10407563/1086511. But the aggregate function has the option to work with functions with more than one argument. Are you telling me it CAN'T do it with weighted.mean? It's against my philosophy of design to use a package to do something the basic functions should do... – Rodrigo May 06 '14 at 19:18
  • @Rodrigo You can definitely supply extra arguments! The problem is that those arguments won't be subsetted by aggregate etc. in the same way that your data are subsetted. When you want multiple arguments subsetted in the same manner, supply those arguments as a data.frame, and adjust the function to look in the right column (as one solution). – rbatt May 06 '14 at 19:22
  • If you explain this as an answer and it works, I'll chose it. Thanks, @rbatt! – Rodrigo May 06 '14 at 19:28
  • @Rodrigo I (hopefully) made some helpful edits to my answer – there is a new function that knows which columns in your supplied data.frame contain x and w, and I used `...` to permit the option to pass additional arguments to the function. In this case, I am using na.rm=TRUE to show how this argument could be set via ddply. – rbatt May 06 '14 at 19:35
2

If you want to use base functions, here's one possibility

as.vector(by(ages[c("Age","W")],
    list(ages$Indiv),
     function(x) {
         do.call(weighted.mean, unname(x))
     }
))

Since aggregate won't subset multiple columns, i user the more general by and simplified the result to a vector.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
1

Your number of weight values do not match your number of groups and so aggregate cannot collapse the groups properly. Here is a very inelegant solution using a for loop.

ages = data.frame(Indiv=c(1,1,2,2),Age=c(10,15,5,100),W=c(2,5,1,2))

age.Indiv <- vector()
  for(i in unique(ages$Indiv)){
  age.Indiv <- append(age.Indiv, weighted.mean( ages[ages$Indiv == i ,]$Age, 
                      ages[ages$Indiv == i ,]$W))
    } 
  names(age.Indiv) <- unique(ages$Indiv)
    age.Indiv
Jeffrey Evans
  • 2,325
  • 12
  • 18
  • This is not true. The length of unique values isn't the problem. The problem is that aggregate doesn't subset the additional parameters that are passed to it in the same way it subsets the first parameter. – MrFlick May 06 '14 at 19:21
  • But it does not make sense to have multiple weights for the same group. It seems like these functions are behaving as expected. – Jeffrey Evans May 06 '14 at 19:22
  • I think you give these functions too much credit for deciding what "makes sense." `weighted.mean` should be able to calculate a weighted mean no matter what two vectors you pass it as long as they are of equal length. The problem here was they were not the same length do to the way `aggregate()` handles the `...` parameters. – MrFlick May 06 '14 at 19:33
  • @MrFlick, this is true and I provided a different, albeit, inelegant solution in base. – Jeffrey Evans May 06 '14 at 19:37