Dynamic if else for both source and replacement columns in dplyr across

Question

I have this table:

df <- data.frame(value_2022 = c(1, NA, 3), 
               volume_2022 = c(NA, 2, 3), 
               value_2022_replacement = c(1.5, 2.5, 3.5),
               volume_2022_replacement = c(0.5, 1.5, 2.5))
df
#>   value_2022 volume_2022 value_2022_replacement volume_2022_replacement
#> 1          1          NA                    1.5                     0.5
#> 2         NA           2                    2.5                     1.5
#> 3          3           3                    3.5                     2.5

I would like to programmatically replace the NA values of each 2022 column with their corresponding _replacement columns through across, my code looks like the following:

df %>% 
  mutate(across(matches("^v.+2022$"), \(x) ifelse(is.na(x), 
                                                  {replacewithcorresponding "_replacement" variable}, 
                                                  x)))

I am wondering whether there's any way to substitute {replacewithcorresponding "_replacement" variable} by something that allows me to do this for an unlimited number of columns that match the {same name}_2022_replacement pattern.

score 1 · Answer 1 · answered Mar 30 '23 at 11:59

We can use the {dplyover} package for this. Disclaimer: I'm the maintainer and the package is not on CRAN.

The easy way is across2, it requires the columns to be in the according order:

library(dplyr)
library(dplyover)


df %>% 
  mutate(across2(ends_with("_2022"), # below .x
                 ends_with("_2022_replacement"), # below .y
                 ~ ifelse(is.na(.x), .y, .x),
                 .names = "{xcol}"
                 )
         )
#>   value_2022 volume_2022 value_2022_replacement volume_2022_replacement
#> 1        1.0         0.5                    1.5                     0.5
#> 2        2.5         2.0                    2.5                     1.5
#> 3        3.0         3.0                    3.5                     2.5

The safer but a bit more verbose option is dplyover::over(). Here we first extract the variable stems with cut_names() and then use .("") to constract and evaluate the string variable names inside the function in .fns:

df %>% 
  mutate(over(cut_names("_replacement"), # extracts c("value_2022","volume_2022")
                 ~ ifelse(is.na(.("{.x}")),
                          .("{.x}_replacement"),
                          .("{.x}")),
                 .names = "{x}"
                 )
         )
#>   value_2022 volume_2022 value_2022_replacement volume_2022_replacement
#> 1        1.0         0.5                    1.5                     0.5
#> 2        2.5         2.0                    2.5                     1.5
#> 3        3.0         3.0                    3.5                     2.5

Data from OP


df <- data.frame(value_2022 = c(1, NA, 3), 
                 volume_2022 = c(NA, 2, 3), 
                 value_2022_replacement = c(1.5, 2.5, 3.5),
                 volume_2022_replacement = c(0.5, 1.5, 2.5))

^{Created on 2023-03-30 with reprex v2.0.2}

score 1 · Answer 2 · answered Mar 30 '23 at 12:09

Here's a dplyr solution that uses the cur_data() and cur_column() functions. The spacing of my mutate statement isn't normally how I'd format it, but I think this makes it a little easier to read for demonstration purposes.

df <- data.frame(value_2022 = c(1, NA, 3), 
                 volume_2022 = c(NA, 2, 3), 
                 value_2022_replacement = c(1.5, 2.5, 3.5),
                 volume_2022_replacement = c(0.5, 1.5, 2.5))

df %>% 
  mutate(
    across(
      matches("^v.+2022$"),
      \(x) ifelse(is.na(x), cur_data()[[paste(cur_column(), 'replacement', sep = '_')]], x)
    )
  )

  value_2022 volume_2022 value_2022_replacement volume_2022_replacement
1        1.0         0.5                    1.5                     0.5
2        2.5         2.0                    2.5                     1.5
3        3.0         3.0                    3.5                     2.5

score 1 · Accepted Answer · answered Mar 30 '23 at 14:39

Using coalesce

library(dplyr) # version >= 1.1.0
library(stringr)
df %>%
  mutate((across(matches("\\d{4}$"), ~ coalesce(.x,
    pick(str_c(cur_column(), '_replacement'))[[1]]))))

-output

 value_2022 volume_2022 value_2022_replacement volume_2022_replacement
1        1.0         0.5                    1.5                     0.5
2        2.5         2.0                    2.5                     1.5
3        3.0         3.0                    3.5                     2.5

Dynamic if else for both source and replacement columns in dplyr across

3 Answers3