1

I have a data frame as follows:

date              Rank         new_Weight       c
2019-01-01         20           2               10
2019-01-01         30           5               10 
2019-01-01         10           8               10
2019-02-02          3           10               60
2019-02-02          5            2               60
....               ...          ....

I want to calculate the weighted average based on Rank and new weight I have applied the following code:

by(df, df$date,subset) function(x){
  x<-df$rank*df$new_weight/sum(df$new_weigth)
}

and create a new column.

I wrote the following function and it works very well.

df<- df %>% group_by(date) %>% mutate(w=weighted.mean(rank,new_weight))

however I am wondering why the first function does not work.

user
  • 592
  • 6
  • 26
  • Can't you just use `weighted.mean()` function? `with(df, weighted.mean(Rank, new_Weight))`. – tmfmnk Jul 21 '19 at 07:35
  • you mean in the function i use weigted.mean() instead of the function that I wrote. – user Jul 21 '19 at 07:36
  • 1
    In your function you have argument `x`, you'll also need a following expression containing `x`. – jay.sf Jul 21 '19 at 07:40

2 Answers2

3

Is this sample answer your question ?

 date<-c(2017, 2017, 2018, 2019, 2018, 2019)
 rank<-c(10, 12, 13, 11, 14, 15)
 weight<- c(1.5, 1.1, 1.2, 1.3, 1.4, 1.7)
 df<-data.frame(date, rank, weight)
 df
 df<- df %>% group_by(date) %>% mutate(w=weighted.mean(rank,new_weight))

You don't need any function to do this ;)

RPo
  • 35
  • 7
  • but this once does not consider the group_by date and calculate ? does it? – user Jul 21 '19 at 08:29
  • No, your line `df<- df %>% group_by(date) %>% mutate(w=weighted.mean(rank,new_weight))` perfecty do this job ;) – RPo Jul 21 '19 at 08:31
  • yes but how i can fix the problem of first function because the second function that I wrote answer to the question – user Jul 21 '19 at 08:36
  • Have a look here [link] (https://stackoverflow.com/questions/31431322/run-a-custom-function-on-a-data-frame-in-r-by-group) it seems to be a relative question. – RPo Jul 21 '19 at 08:42
3

I think with by what you are trying to do is reference x as dataframe and not df. Also the formula to calculate weighted mean needs to be changed

by(df, df$date, function(x) sum(x$Rank * x$new_Weight)/sum(x$new_Weight))

#df$date: 2019-01-01
#[1] 18
#--------------------------------------------------------------------------------- 
#df$date: 2019-02-02
#[1] 3.333333

which is same as applying weighted.mean

by(df, df$date, function(x) weighted.mean(x$Rank, x$new_Weight))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213