In a named argument to dplyr::funs, can I reference the names of other arguments?

Question

Consider the following:

library(tidyverse)

df <- tibble(x = rnorm(100), y = rnorm(100, 10, 2), z = x * y)

df %>% 
mutate_all(funs(avg = mean(.), dev = sd(.), scaled = (. - mean(.)) / sd(.)))

Is there a way to avoid calling mean and sd twice by referencing the avg and dev columns. What I have in mind is something like

df %>% 
mutate_all(funs(avg = mean(.), dev = sd(.), scaled = (. - avg) / dev))

Clearly this won't work because there aren't columns avg and dev, but x_avg, x_dev, y_avg, y_dev, etc.

Is there a good way, within funs to use the rlang tools to create those column references programmatically, so that I can refer to columns created by the previous named arguments to funs (when . is x, I would reference x_mean and x_dev for calculating x_scaled, and so forth)?

score 5 · Answer 1 · answered Nov 04 '18 at 21:22

I think it would be easier if you convert your data to long format

library(tidyverse)

set.seed(111)
df <- tibble(x = rnorm(100), y = rnorm(100, 10, 2), z = x * y)

df %>% 
  gather(key, value) %>% 
  group_by(key) %>% 
  mutate(avg    = mean(value),
         sd     = sd(value),
         scaled = (value - avg) / sd)
#> # A tibble: 300 x 5
#> # Groups:   key [3]
#>    key    value     avg    sd scaled
#>    <chr>  <dbl>   <dbl> <dbl>  <dbl>
#>  1 x      0.235 -0.0128  1.07  0.232
#>  2 x     -0.331 -0.0128  1.07 -0.297
#>  3 x     -0.312 -0.0128  1.07 -0.279
#>  4 x     -2.30  -0.0128  1.07 -2.14 
#>  5 x     -0.171 -0.0128  1.07 -0.148
#>  6 x      0.140 -0.0128  1.07  0.143
#>  7 x     -1.50  -0.0128  1.07 -1.39 
#>  8 x     -1.01  -0.0128  1.07 -0.931
#>  9 x     -0.948 -0.0128  1.07 -0.874
#> 10 x     -0.494 -0.0128  1.07 -0.449
#> # ... with 290 more rows

^{Created on 2018-11-04 by the reprex package (v0.2.1.9000)}

Worth noting that, if desired, the result can be converted back to the wide format using `spread`. — Artem Sokolov, Nov 19 '18 at 16:23

score 2 · Answer 2 · answered Nov 05 '18 at 10:06

2

This might work for you :

avg <- quo(mean(.))
dev <- quo(sd(.))
res <- df %>% 
  mutate_all(funs(avg = !!avg, dev = !!dev, scaled = (. - !!avg) / !!dev))

Confirm that it works :

res0 <- df %>% 
  mutate_all(funs(avg = mean(.), dev = sd(.), scaled = (. - mean(.)) / sd(.)))
identical(res, res0)
# [1] TRUE

answered Nov 05 '18 at 10:06

moodymudskipper

46,417
11
121
167

3

Technically, this still computes `mean(.)` and `sd(.)` twice, but this is a clever example of expression arithmetic. – Artem Sokolov Nov 19 '18 at 16:29

score 1 · Accepted Answer · answered Nov 04 '18 at 22:14

This seems a little convoluted, but it works:

scaled <- function(col_name, x, y) {
  col_name <- deparse(substitute(col_name))
  avg <- eval.parent(as.symbol(paste0(col_name, x)))
  dev <- eval.parent(as.symbol(paste0(col_name, y)))
  (eval.parent(as.symbol(col_name)) - avg) / dev
}

df %>%
  mutate_all(funs(avg = mean(.), dev = sd(.), scaled = scaled(., "_avg", "_dev")))

In a named argument to dplyr::funs, can I reference the names of other arguments?

3 Answers3