2

Suppose I want to create a mean variable in a given dataframe based on two vectors, one specifying the names of the variables to use, and one specifying weights by which these variables should go into the mean variable:

vars <- c("a", "b", "c","d"))
weights <- c(0.5, 0.7, 0.8, 0.2))
df <- data.frame(cbind(c(1,4,5,7), c(2,3,7,5), c(1,1,2,3), 
                       c(4,5,3,3), c(3,2,2,1), c(5,5,7,1)))
colnames(df) <- c("a","b","c","d","e","f")

How could I use dplyr::mutate() to create a mean variable that uses vars and weights to calculate a rowwise score? mutate() should specifically use the variables supplied by vars The result should basically do the following:

df <- df %>% 
  rowwise() %>% 
  mutate(comp = mean(c(vars[1]*weights[1], vars[2]*weights[2], ...)))

Written out:

df2 <- df %>% 
  rowwise() %>% 
  mutate(comp = mean(c(0.5*a, 0.7*b, 0.8*c, 0.2*d)))

I can't figure out how to do this because, although vars contains the exact variable names that I want to use for mutate in my df, inside vars they are strings. How could I make mutate() understand that the strings vars contains relate to columns in my df? If you know another procedure not using mutate() that's fine also. Thanks!

nklsstll
  • 25
  • 5

3 Answers3

1

You may use

df %>% mutate(wmean = apply(.[vars], 1, weighted.mean, weights))
#   a b c d e f     mean
# 1 1 2 1 4 3 5 1.590909
# 2 4 3 1 5 2 5 2.681818
# 3 5 7 2 3 2 7 4.363636
# 4 7 5 3 3 1 1 4.545455

but there is not much to gain with tidyverse as base R approaches can be almost the same and end up being shorter:

df$wmean <- apply(df[vars], 1, weighted.mean, weights)

or one of the following:

df$wmean <- colSums(t(df[vars]) * weights) / sum(weights)
df$wmean <- as.matrix(df[vars]) %*% weights / sum(weights)
df$wmean <- rowSums(sweep(df[vars], 2, weights, `*`)) / sum(weights)
Julius Vainora
  • 47,421
  • 9
  • 90
  • 102
0

Row-wise operations can be a bit tricky in the tidyverse. This is a case where some base R knowledge can be really handy. For example, you can do it in one line with apply (note that I corrected a typo in the line that creates weights and drop columns e and f, which do not have weights):

vars <- c("a", "b", "c","d")
weights <- c(0.5, 0.7, 0.8, 0.2)
df <- data.frame(cbind(c(1,4,5,7), c(2,3,7,5), c(1,1,2,3), 
                       c(4,5,3,3), c(3,2,2,1), c(5,5,7,1)))
colnames(df) <- c("a","b","c","d","e","f")

df$weighted.mean <- apply(df %>% select(-e, -f), 1, weighted.mean, weights)

  a b c d e f weighted.mean
1 1 2 1 4 3 5      1.590909
2 4 3 1 5 2 5      2.681818
3 5 7 2 3 2 7      4.363636
4 7 5 3 3 1 1      4.545455

If you really wanted to do it in the tidyverse, this should get you started:

library(tidyverse)

df.weights <- data.frame(vars, weights)

df.new <- df %>% 
  mutate(row.num = 1:n()) %>% 
  gather(variable, value, -row.num) %>% 
  left_join(df.weights, by = c(variable = 'vars')) %>% 
  filter(variable %in% vars) %>% 
  group_by(row.num) %>% 
  mutate(weighted.mean = weighted.mean(value, weights))
jdobres
  • 11,339
  • 1
  • 17
  • 37
  • Repeating the OP's `as.vector` call to surround a `c` encourages wasteful operations. That code looks pretty convoluted. – IRTFM Nov 24 '18 at 16:48
  • His code has a spelling error in the second (unnecessary) `as.vectors`. – IRTFM Nov 24 '18 at 16:53
  • Thank you, this also works but I ended up choosing Julius Vainoras answer because it uses the `vars` vector and is even shorter. Better catch up on some of that basic R knowledge! – nklsstll Nov 24 '18 at 17:23
0

There should be a tidyverse solution using pmap, but it eludes me. Here's another approach using tidyverse packages purrr and tibble

library(tidyverse)

vars <- c("a", "b", "c", "d")
weights <- c(0.5, 0.7, 0.8, 0.2)
df <- data.frame(cbind(c(1,4,5,7), c(2,3,7,5), c(1,1,2,3), 
                       c(4,5,3,3), c(3,2,2,1), c(5,5,7,1)))
colnames(df) <- c("a","b","c","d","e","f")

df %>% 
 transpose() %>% 
  simplify_all() %>% 
  map_dbl(~weighted.mean(.x[vars], weights)) %>% 
  add_column(df, wmean = .)
#>   a b c d e f    wmean
#> 1 1 2 1 4 3 5 1.590909
#> 2 4 3 1 5 2 5 2.681818
#> 3 5 7 2 3 2 7 4.363636
#> 4 7 5 3 3 1 1 4.545455

Created on 2018-11-24 by the reprex package (v0.2.1)

Jake Kaupp
  • 7,892
  • 2
  • 26
  • 36