How to avoid repeating code in dplyr::mutate() call with multiple arguments?

Question

Problem

I am transitioning to dplyr from base R.

I would like to shorten the following code to respect the DRY (Don't Repeat Yourself) principle:

mtcars %>% mutate(w = rowMeans(select(., mpg:disp), na.rm = TRUE),
                  x = rowMeans(select(., hp:wt), na.rm = TRUE),
                  y = rowMeans(select(., qsec:am), na.rm = TRUE),
                  z = rowMeans(select(., gear:carb), na.rm = TRUE))

or

mtcars %>% rowwise() %>% mutate(w = mean(mpg:disp, na.rm = TRUE),
                                x = mean(hp:wt, na.rm = TRUE),
                                y = mean(qsec:am, na.rm = TRUE),
                                z = mean(gear:carb, na.rm = TRUE))
# Note: this one produced an error with my own data

Goal

The goal is to compute the means of different scales in a data frame from a single call. As you can see, the rowMeans, select, and na.rm arguments repeat several times (imagine I have several more variables than for this example).

Attempts

I was trying to come up with an across() solution,

mtcars %>% mutate(across(mpg:carb, mean, .names = "mean_{col}"))

But it doesn't produce the correct outcome because I don't see how to specify different column arguments for w:z. Using the c_across from the documentation example and we are back to repeating code:

mtcars %>% rowwise() %>% mutate(w = mean(c_across(mpg:disp), na.rm = TRUE),
                                x = mean(c_across(hp:wt), na.rm = TRUE),
                                y = mean(c_across(qsec:am), na.rm = TRUE),
                                z = mean(c_across(gear:carb), na.rm = TRUE))

I am tempted to resort to lapply or a custom function but I feel like it would be defeating the purpose of adapting to dplyr and the new across() argument.

Edit: To clarify, I want to avoid calling rowMeans, select, and na.rm more than once.

Related threads: 1, 2, 3.

akrun · Answer 1 · 2020-07-22T01:42:55.250

2

We don't need rowwise, instead use select with rowMeans which is vectorized. In order to make this easier, a function can be created

f1 <- function(dat, nm1) {
          dat %>%
            select({{nm1}}) %>%
             rowMeans(na.rm = TRUE)
    }

mtcars %>% mutate(w = f1(dat = ., nm1 = mpg:disp),
                  x = f1(dat = ., nm1 = hp:wt),
                  y = f1(dat = ., nm1 = qsec:am),
                  z = f1(dat = ., nm1= gear:carb)  )

edited Jul 22 '20 at 01:42

answered Jul 22 '20 at 01:16

akrun

874,273
37
540
662

Thanks for the suggestion. I don't feel like the code is shorter than my first solution though. To clarify, I want to avoid calling `rowMeans`, `select`, and `na.rm` more than once (I've edited my question in this regard). – rempsyc Jul 22 '20 at 01:21
@RemPsyc you may do this with a loop using `map` or create a function as in the update – akrun Jul 22 '20 at 01:34
I see. So there is no way to do this in `dplyr` without resorting to a function/loop/`lapply`. I was told that one could do away with `lapply` with `dplyr`, thus my confusion. Thanks for the function though, I don't think I would have figured it out by myself. – rempsyc Jul 22 '20 at 01:37
@RemPsyc this is a custom range. You may be able to create a wrapper function around it to pass multiple sets of column names – akrun Jul 22 '20 at 01:47

score 2 · Answer 2 · answered Jul 22 '20 at 02:56

Use a custom function (but organize it a bit differently to reduce repeating code)

mm <- function(data, new_col, cols_to_mut) {
    data %>%
        mutate(
            {{ new_col }} := mean(c_across({{ cols_to_mut }}), na.rm=TRUE)
        )
}

mtcars %>% 
    rowwise %>% 
    mm(w, mpg:disp) %>%
    mm(x, hp:wt) %>%
    mm(y, qsec:am) %>%
    mm(z, gear:carb) %>%
    ungroup

score 2 · Answer 3 · answered Apr 06 '22 at 02:06

Consider using purrr::reduce2 to avoid the repetition

mtcars %>% 
  reduce2(
    c("w","x", "y", "z"),
    c("mpg:disp", "hp:wt","qsec:am","gear:carb"),
    ~ ..1 %>% rowwise %>% mutate(!!..2 := mean(c_across(!!rlang::parse_expr(..3)), na.rm=TRUE)),
    .init = .)



# A tibble: 32 x 15
# Rowwise: 
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb     w     x     y     z
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4  62.3  38.8  5.82   4  
 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4  62.3  38.9  6.01   4  
 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1  44.9  33.1  6.87   2.5
 4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1  95.1  38.8  6.81   2  
 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2 129.   60.5  5.67   2.5
 6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1  83.0  37.1  7.07   2  
 7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4 127.   83.9  5.28   3.5
 8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2  58.4  23.0  7      3  
 9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2  55.9  34.0  7.97   3  
10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4  64.3  43.5  6.43   4  
# ... with 22 more rows

score 0 · Accepted Answer · answered May 19 '23 at 17:19

New slightly shorter solution as of dplyr 1.1.0 using the new pick() function:

library(dplyr)

mtcars %>% mutate(w = rowMeans(pick(mpg:disp), na.rm = TRUE),
                  x = rowMeans(pick(hp:wt), na.rm = TRUE),
                  y = rowMeans(pick(qsec:am), na.rm = TRUE),
                  z = rowMeans(pick(gear:carb), na.rm = TRUE)) %>% 
  head()
#>                    mpg cyl disp  hp drat    wt  qsec vs am gear carb         w
#> Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4  62.33333
#> Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4  62.33333
#> Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1  44.93333
#> Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1  95.13333
#> Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2 128.90000
#> Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1  83.03333
#>                          x        y   z
#> Mazda RX4         38.84000 5.820000 4.0
#> Mazda RX4 Wag     38.92500 6.006667 4.0
#> Datsun 710        33.05667 6.870000 2.5
#> Hornet 4 Drive    38.76500 6.813333 2.0
#> Hornet Sportabout 60.53000 5.673333 2.5
#> Valiant           37.07333 7.073333 2.0

Explanation: the new pick() function now allows us to avoid specifying the dot argument as in select().

^{Created on 2023-05-19 with reprex v2.0.2}

How to avoid repeating code in dplyr::mutate() call with multiple arguments?

Problem

Goal

Attempts

4 Answers4