Problem
I am transitioning to dplyr
from base R
.
I would like to shorten the following code to respect the DRY (Don't Repeat Yourself) principle:
mtcars %>% mutate(w = rowMeans(select(., mpg:disp), na.rm = TRUE),
x = rowMeans(select(., hp:wt), na.rm = TRUE),
y = rowMeans(select(., qsec:am), na.rm = TRUE),
z = rowMeans(select(., gear:carb), na.rm = TRUE))
or
mtcars %>% rowwise() %>% mutate(w = mean(mpg:disp, na.rm = TRUE),
x = mean(hp:wt, na.rm = TRUE),
y = mean(qsec:am, na.rm = TRUE),
z = mean(gear:carb, na.rm = TRUE))
# Note: this one produced an error with my own data
Goal
The goal is to compute the means of different scales in a data frame from a single call. As you can see, the rowMeans
, select
, and na.rm
arguments repeat several times (imagine I have several more variables than for this example).
Attempts
I was trying to come up with an across()
solution,
mtcars %>% mutate(across(mpg:carb, mean, .names = "mean_{col}"))
But it doesn't produce the correct outcome because I don't see how to specify different column arguments for w:z
. Using the c_across
from the documentation example and we are back to repeating code:
mtcars %>% rowwise() %>% mutate(w = mean(c_across(mpg:disp), na.rm = TRUE),
x = mean(c_across(hp:wt), na.rm = TRUE),
y = mean(c_across(qsec:am), na.rm = TRUE),
z = mean(c_across(gear:carb), na.rm = TRUE))
I am tempted to resort to lapply
or a custom function but I feel like it would be defeating the purpose of adapting to dplyr
and the new across()
argument.
Edit: To clarify, I want to avoid calling rowMeans
, select
, and na.rm
more than once.