Weighted mean calculation in R with missing values

Question

Does anyone know if it is possible to calculate a weighted mean in R when values are missing, and when values are missing, the weights for the existing values are scaled upward proportionately?

To convey this clearly, I created a hypothetical scenario. This describes the root of the question, where the scalar needs to be adjusted for each row, depending on which values are missing.

Image: Weighted Mean Calculation

File: Weighted Mean Calculation in Excel

It's definitely possible to do in R. Try having a go yourself and posting some example code here where you run into problems. — Scransom, Oct 01 '17 at 22:52
Thanks qqq. There are many similar samples of code in related questions, [link](https://stackoverflow.com/questions/40541172/weighted-average-value-in-the-presence-of-na-values?rq=1), but it seems like most want to mutate, or replace with the mean, or replace with zero, when there is an N/A. Without being a burden and asking the same question, I thought it might be easier to show the explicit difference with my case, where I want to re-scale the remaining variables. I hadn't seen that elsewhere. And it might just be an obvious, short answer, by using **na.rm**. — milaske, Oct 01 '17 at 23:10

score 1 · Accepted Answer · answered Oct 01 '17 at 22:58

Using weighted.mean from the base stats package with the argument na.rm = TRUE should get you the result you need. Here is a tidyverse way this could be done:

library(tidyverse)
scores <- tribble(
 ~student, ~test1, ~test2, ~test3,
   "Mark",     90,     91,     92,
   "Mike",     NA,     79,     98,
   "Nick",     81,     NA,     83)

weights <- tribble(
  ~test,   ~weight, 
  "test1",     0.2, 
  "test2",     0.4,
  "test3",     0.4)

scores %>% 
  gather(test, score, -student) %>%
  left_join(weights, by = "test") %>%
  group_by(student) %>%
  summarise(result = weighted.mean(score, weight, na.rm = TRUE))
#> # A tibble: 3 x 2
#>   student   result
#>     <chr>    <dbl>
#> 1    Mark 91.20000
#> 2    Mike 88.50000
#> 3    Nick 82.33333

Thank you @markdly. I suppose there was a much easier way to ask the question without charts and excel files. What I didn't understand based on the documentation was the effect of **na.rm**. By making that TRUE, you confirmed that it solves the root of my problem, which is automatically scaling the existing weights based on the variables with data. I thought it was going to be much more difficult because the missing variables are different row by row. Thanks again. — milaske, Oct 01 '17 at 23:16

score 0 · Answer 2 · answered Oct 01 '17 at 22:55

0

The best way to post an example dataset is to use dput(head(dat, 20)), where dat is the name of a dataset. Graphic images are a really bad choice for that.
DATA.

dat <-
structure(list(Test1 = c(90, NA, 81), Test2 = c(91, 79, NA), 
    Test3 = c(92, 98, 83)), .Names = c("Test1", "Test2", "Test3"
), row.names = c("Mark", "Mike", "Nick"), class = "data.frame")

w <-
structure(list(Test1 = c(18, NA, 27), Test2 = c(36.4, 39.5, NA
), Test3 = c(36.8, 49, 55.3)), .Names = c("Test1", "Test2", "Test3"
), row.names = c("Mark", "Mike", "Nick"), class = "data.frame")

CODE.
You can use function weighted.mean in base package statsand sapply for this. Note that if your datasets of notes and weights are R objects of class matrix you will not need unlist.

sapply(seq_len(nrow(dat)), function(i){
    weighted.mean(unlist(dat[i,]), unlist(w[i, ]), na.rm = TRUE)
})

answered Oct 01 '17 at 22:55

Rui Barradas

70,273
8
34
66

Thank you. There are many related posts with similar data and code, and I suppose the reason I posted the image was to explicitly show the importance of scaling the existing factors, which I hadn't seen elsewhere. In the documentation [link](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/weighted.mean.html), **na.rm** is a logical value indicating whether NA values in x should be stripped before the computation proceeds. You show this as TRUE. Does this handle the re-weighting automatically? – milaske Oct 01 '17 at 23:03
@milaske I believe that yes, like the link says `the weights coerced to numeric by as.numeric and normalized to sum to one`. And in my tests the results were equal to yours, with some rounding problems only. – Rui Barradas Oct 02 '17 at 00:47

Weighted mean calculation in R with missing values

2 Answers2