12
 a=c(1,2,NA,4)
 b=c(10,NA,30,40)
 weighted.mean(a,b,na.rm = T)

The above code gives me NA as the answer, I think na.rm only ignores the NA values in vector a and not b. How can I ignore the NA in vector b or weights to be specific. I just cannot change the NA to 0, I know that would do the trick but looking for a tweak in the formula itself.

user438383
  • 5,716
  • 8
  • 28
  • 43
Jain
  • 959
  • 2
  • 13
  • 31

4 Answers4

6

This is the function I ended up writing to solve this problem:

weighted_mean <- function(x, w, ..., na.rm = FALSE){

  if(na.rm){

    df_omit <- na.omit(data.frame(x, w))

    return(weighted.mean(df_omit$x, df_omit$w, ...))

  } 

  weighted.mean(x, w, ...)
}
Mhairi McNeill
  • 1,951
  • 11
  • 20
6

I adapted Mhairi's code to not use data.frame nor na.omit:

weighted_mean = function(x, w, ..., na.rm=F){
  if(na.rm){
    keep = !is.na(x)&!is.na(w)
    w = w[keep]
    x = x[keep]
  }
  weighted.mean(x, w, ..., na.rm=F)
}

It's really surprising that R builtin weighted.mean na.rm=T doesn't handle NA weights. Just wasted a few hours discovering that.

EDIT: here also is a data.table way in case someone wants to calculate grouped weighted means:

# mean of column a weighted by b grouped by g1 and g2
DT[!is.na(b),.(wm=weighted.mean(a,b,na.rm=T)),.(g1,g2)]
# wm will be NA for a group iff all rows for the group have
# at least one of a or b NA
webb
  • 4,180
  • 1
  • 17
  • 26
2

I made a simple modification to the weight w in weighted.mean by coalesce as follows:

a = c(1,2,NA,4)
b = c(10,NA,30,40)
weighted.mean(a, dplyr::coalesce(b,0), na.rm = T)

The idea is I replaced missing weights by zeros, so it fix the error. It returns the result as 3.4, :)).

0

Another option is to use collapse::fmean which treats missing weights as 0. It also defaults to na.rm = TRUE and is very fast (see benchmark).

fmean(a, w = b)
#[1] 3.4

Benchmark:

microbenchmark::microbenchmark(
  collapse = fmean(a, w = b),
  coalesce = weighted.mean(a, dplyr::coalesce(b,0), na.rm = T),
  webb = weighted_mean(a, b, na.rm = TRUE)
)

# Unit: microseconds
#      expr     min      lq      mean  median       uq     max neval
#  collapse   5.302   6.401   9.11210   8.301  11.2010  27.601   100
#  coalesce 261.201 274.052 288.82310 280.401 291.2515 528.500   100
#      webb   7.202   8.951  11.26096  11.501  13.3010  19.202   100
Maël
  • 45,206
  • 3
  • 29
  • 67