-3

I have the following dataframe:

df = data.frame(date = c("26/06/2013", "26/06/2013",  "26/06/2013",  "27/06/2013", "27/06/2013", "27/06/2013", "28/06/2013", "28/06/2013",   "28/06/2013"), return = c(".51", ".32", ".34", ".39", "1.1", "3.2", "2.1", "5.3", "2.1"), cap = c("500", "235", "392", "213", "134", "144", "232", "155", "213"), weight = c("0.443655723", "0.20851819", "0.347826087", "0.433808554", "0.272912424", "0.293279022", "0.386666667", "0.258333333", "0.355"))

I would like to calculate:

1) The last column of "weight". Which is the weights of the "cap" column PER DAY.

2) The weighted "cap" mean of "return" PER DAY. I want to get the following output:

result = data.frame(date = c("26/06/2013", "27/06/2013", "28/06/2013"), cap.weight.mean = c("0.411251109", "1.407881874", "2.926666667"))
ekad
  • 14,436
  • 26
  • 44
  • 46
rwn1v
  • 777
  • 3
  • 10
  • 23
  • hello and welcome to SO. Can you please elaborate on your question. Specifically, what is meant by "last column of weight"? Is `weight` not the last column of `df`. Also, what do you mean by "weighted cap mean of return" ? – Ricardo Saporta Jul 13 '13 at 00:40

3 Answers3

2

Another possibility using plyr function:

library(plyr)
# Change factor to numeric
> df[,-1]<-sapply(df[,-1],function(x){as.numeric(as.character(x))})
> ddply(df,.(date),summarize,cap.weight.mean=weighted.mean(return,weight))
        date cap.weight.mean
1 26/06/2013       0.4112511
2 27/06/2013       1.4078819
3 28/06/2013       2.9266667
Jd Baba
  • 5,948
  • 18
  • 62
  • 96
0

If necessary, change factors to numeric first

df$return=as.numeric(levels(df$return))[df$return]
df$cap=as.numeric(levels(df$cap))[df$cap]
df$weight=as.numeric(levels(df$weight))[df$weight]

Question 1)

 library(plyr)
 #pretend weight column were absent in df
 ddply(df[,-ncol(df)],"date",function(x) data.frame(x,weight=x$cap/sum(x$cap)))

Question 2)

 library(plyr)
 ddply(df,"date",function(x) data.frame(date=x$date[1],cap.weight.mean=sum(x$cap*x$return)/sum(x$cap)))
cryo111
  • 4,444
  • 1
  • 15
  • 37
0

Here's another option using by!

After converting to numeric as cryo111 mentioned.

R> by(df, df$date, FUN = function(x) weighted.mean(x$return, w = x$weight) )
df$date: 26/06/2013
[1] 0.4112511
------------------------------------------------------------ 
df$date: 27/06/2013
[1] 1.407882
------------------------------------------------------------ 
df$date: 28/06/2013
[1] 2.926667

That produces the info in your result data.frame. I am guessing that is what you are looking for

Here's another solution using memisc:::aggregate.formula

> library(memisc)
> aggregate(weighted.mean(return, weight) ~ date, data = df)
>        date weighted.mean(return, weight)
1 26/06/2013                     0.4112511
4 27/06/2013                     1.4078819
7 28/06/2013                     2.9266667
Jake Burkhead
  • 6,435
  • 2
  • 21
  • 32