2

The question is basically the samt as this: Aggregate and Weighted Mean in R.

But i want it to compute it on several columns, using data.table, as I have millions of rows. So something like this:

set.seed(42)   # fix seed so that you get the same results
dat <- data.frame(assetclass=sample(LETTERS[1:5], 20, replace=TRUE), 
                                    tax=rnorm(20),tax2=rnorm(20), assets=1e7+1e7*runif(20), assets2=1e6+1e7*runif(20))

DT <- data.table(dat)

I can compute the weighted mean on one column, assets, like this:

DT[,list(wret = weighted.mean(tax,assets)),by=assetclass]

But how to do it on both assets and assets2?
What if there are several columns, like col=c("assets1", "assets2", "assets3", ... )? And is it also possible to do it for tax, tax1...

Community
  • 1
  • 1
Jeppe Olsen
  • 968
  • 8
  • 19
  • Sorry for the confusion, the dot(.) sneaked in accidentally, and the return was inherited from the original question. – Jeppe Olsen Mar 24 '17 at 07:51
  • DT[, .(wret = weighted.mean(ret,assets)), by=.(assetclass, assets)] wont work, as it would group by assetclass and assets. What i want is two different rows used when weighing the mean. Something like this: DT[, .(wret = weighted.mean(ret,.(assets, assets2)), by=assetclass] – Jeppe Olsen Mar 24 '17 at 07:54
  • 1
    What is with `DT[, .(wret = weighted.mean(tax,assets), wret2 = weighted.mean(tax,asets2)), by=assetclass]` ? Or what does it mean: *But how to do it on both* ? Please give your desired result, i.e. edit your question. – jogo Mar 24 '17 at 07:57
  • We are getting there :) - What if there is a whole string of columns, like col=c("assets1", "assets2", "assets3", ... ) – Jeppe Olsen Mar 24 '17 at 08:09

1 Answers1

4

So you can do it for several columns of weights

DT <- data.table(assetclass=sample(LETTERS[1:5], 20, replace=TRUE), 
                  tax=rnorm(20), assets=1e7+1e7*runif(20), asets2=1e6+1e7*runif(20))
DT[, lapply(.SD, FUN=weighted.mean, x=tax), by=assetclass, .SDcols=3:4]
#    assetclass      assets       asets2
# 1:          D -0.14179882 -0.003717957
# 2:          B  0.61146928  0.523913589
# 3:          E -0.28037796 -0.147677384
# 4:          C -0.09658125 -0.010338894
# 5:          A  0.74954460  0.750190947

or you can exclude the non-weight columns from .SD:

DT[, lapply(.SD, FUN=weighted.mean, x=tax), by=assetclass, .SDcols=-(1:2)]

Here is a variant using matrix multiplication:

DT[, as.list(crossprod(as.matrix(.SD), tax)/colSums(.SD)), by=assetclass, .SDcols=-(1:2)]

The matrix multiplication can do it also for several columns tax1, tax2, ...

DT <- data.table(assetclass=sample(LETTERS[1:5], 20, replace=TRUE), 
                 tax1=rnorm(20), tax2=rnorm(20), assets=1e7+1e7*runif(20), asets2=1e6+1e7*runif(20))
DT[, as.list(crossprod(as.matrix(.SD), tax1)/colSums(.SD)), by=assetclass, .SDcols=-(1:2)]
DT[, as.list(crossprod(as.matrix(.SD), tax2)/colSums(.SD)), by=assetclass, .SDcols=-(1:2)]
DT[, as.list(crossprod(as.matrix(.SD), cbind(tax1, tax2))/colSums(.SD)), by=assetclass, .SDcols=-(1:2)]
jogo
  • 12,469
  • 11
  • 37
  • 42