2

Imagine the following data:

data <- tribble(
  ~a1, ~a2, ~b1, ~b2, ~c1, ~c2,
  32, 32, 50, 12, 12, 50,
  48, 20, 55, 43, 10, 42
)

For i = {1, 2} I want to compute deltai = (ai - ci) / ((ai + bi) * ci + ai).

(I am explicitly using random numbers and a random function; no solution can be found by recognising and exploiting some pattern.)

The straightforward way would be to do

data <- data %>%
  mutate(
    delta1 = (a1 - c1) / ((a1 + b1) * c1 + a1),
    delta2 = (a2 - c2) / ((a2 + b2) * c2 + a2)
  )

but it introduces a lot of repetition.

I could do

delta <- function(a, b, c) {
  return((a - c) / ((a + b) * c + a))
}

data <- data %>%
  mutate(
    delta1 = delta(a1, b1, c1),
    delta2 = delta(a2, b2, c2)
  )

which makes it possible to easily change the delta() function later on, but this still looks like a lot of repetition.

My question: is there a way to compute delta1 and delta2 with one line of mutate?

You might think the amount of repetition is OK, but I might need to compute several others terms like gammai or alphai. Duplicating lines doesn't feel like a good solution.

I thought I could solve the issue by doing

for (i in c(1, 2)) {
  data <- data %>%
    mutate("delta{i}" := delta(paste0('a', i), paste0('b', i), paste0('c', i)))
}

but I got

Error in `mutate()`:
! Problem while computing `delta1 = delta(paste0("a", i), paste0("b", i), paste0("c", i))`.
Caused by error in `a - c`:
! non-numeric argument to binary operator
Run `rlang::last_error()` to see where the error occurred.

and it somewhat feels wrong to loop over mutates.

I have seen solutions in Mutate multiple / consecutive columns (with dplyr or base R), How can I mutate multiple variables using dplyr? or Mutating multiple columns in a data frame using dplyr, but the solutions are much less readable than copying and pasting the line and living with the duplication.

Ideally, I am hoping to find a smart use of across that would allow me to write something like mutate("delta{i}" := delta(a{i}, b{i}, c{i})).

darpich
  • 137
  • 4

1 Answers1

5

With glue

You can leverage glue functions. This is probably the nicest and most flexible way:

library(glue)
cols         <- c("1", "2")
exprs        <- glue("(a{cols} - c{cols}) / ((a{cols} + b{cols}) * c{cols} + a{cols})")
names(exprs) <- glue("delta{cols}")

data |> 
  mutate(!!!rlang::parse_exprs(exprs))

# A tibble: 2 × 8
     a1    a2    b1    b2    c1    c2 delta1   delta2
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>
1    32    32    50    12    12    50 0.0197 -0.00806
2    48    20    55    43    10    42 0.0353 -0.00825

With across

If you want to make it with across, you can use a bunch of them like so:

library(dplyr)
data %>% 
  mutate((across(starts_with("a"), .names = "delta{sub('a', '', .col)}") -
            across(starts_with("c"))) / 
           ((across(starts_with("a")) + across(starts_with("b"))) * 
              across(starts_with("c")) + across(starts_with("a"))))

# A tibble: 2 × 8
     a1    a2    b1    b2    c1    c2 delta1   delta2
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>
1    32    32    50    12    12    50 0.0197 -0.00806
2    48    20    55    43    10    42 0.0353 -0.00825

By pivoting

But, you should maybe rather go with pivoting to long and back to wide:

library(dplyr)
library(tidyr)
data %>% 
  mutate(rown = row_number()) %>% 
  pivot_longer(-rown,
               names_to = c(".value", "number"), 
               names_pattern = "([a-z])(\\d)") %>% 
  group_by(rown) %>% 
  mutate(delta = (a - c) / ((a + b) * c + a)) %>% 
  pivot_wider(names_from = number, 
              values_from = a:delta, 
              names_sep = "")

# A tibble: 2 × 9
# Groups:   rown [2]
   rown    a1    a2    b1    b2    c1    c2 delta1   delta2
  <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>
1     1    32    32    50    12    12    50 0.0197 -0.00806
2     2    48    20    55    43    10    42 0.0353 -0.00825
Maël
  • 45,206
  • 3
  • 29
  • 67
  • 1
    Thanks a lot for the answer! I had seen some version of all of these but couldn't pull it off myself. The `glue` approach is the closest to what I had in mind but I'm not sure I'll remember a few months down the line what's the logic... The `across` approach introduces a lot of repetitions as well, and since for of my formulas are even longer that the ones I used in the question, it makes everything unreadable. As you say, the `pivot` approach is probably the most readable and the most logical. – darpich Jan 10 '23 at 08:31