How best to create a new column for each two-column comparison using purrr?

Question

Say I have the following dataframe:

ABC1_old <- c(1, 5, 3, 4, 3, NA, NA, NA, NA, NA)
ABC2_old <- c(4, 2, 1, 1, 5, NA, NA, NA, NA, NA)
ABC1_adj <- c(NA, NA, NA, NA, NA, 5, 5, 1, 2, 4)
ABC2_adj <- c(NA, NA, NA, NA, NA, 3, 2, 1, 4, 2)

df <- data.frame(ABC1_old, ABC2_old, ABC1_adj, ABC2_adj)

I want to create a column that compares each pair of ABCn_old with its corresponding ABCn_adj. (So ABC1_old would be compared against ABCn_adj, etc.) The resulting column would be called ABCn_new. The evaluation would be that if ABCn_old is NA, fill in the blank with the corresponding value in ABCn_adj, otherwise use ABCn_old's value. The new columns would look like this:

df$ABC1_new <- c(1, 5, 3, 4, 3, 5, 5, 1, 2, 4)
df$ABC2_new <- c(4, 2, 1, 1, 5, 3, 2, 1, 4, 2)

I know a simple mutate could work here, but I would like to use some kind of tidyverse looping via purrr if possible since the dataset is much larger in reality. Any ideas for the best way to achieve this?

Onyambu · Accepted Answer · 2021-05-16T16:21:46.297

map_dfc(split.default(df, str_remove(names(df), "_.*")), ~coalesce(!!!.x))
# A tibble: 10 x 2
    ABC1  ABC2
   <dbl> <dbl>
 1     1     4
 2     5     2
 3     3     1
 4     4     1
 5     3     5
 6     5     3
 7     5     2
 8     1     1
 9     2     4
10     4     2

Putting it together:

df %>%
   split.default(str_replace(names(.), "_.*", "_new")) %>%
   map_dfc(~coalesce(!!!.x))%>%
   cbind(df, .)
   ABC1_old ABC2_old ABC1_adj ABC2_adj ABC1_new ABC2_new
1         1        4       NA       NA        1        4
2         5        2       NA       NA        5        2
3         3        1       NA       NA        3        1
4         4        1       NA       NA        4        1
5         3        5       NA       NA        3        5
6        NA       NA        5        3        5        3
7        NA       NA        5        2        5        2
8        NA       NA        1        1        1        1
9        NA       NA        2        4        2        4
10       NA       NA        4        2        4        2

I see this is coalescing the columns, but is there a way to run an evaluation instead? E.g., if Old is NA, use Adj, otherwise use Old? — Natasha R., May 16 '21 at 16:29

score 1 · Answer 2 · answered May 16 '21 at 18:18

Using tidyverse

library(dplyr)
library(tidyr)
library(stringr)
df %>% 
   mutate(rn = row_number()) %>%
  pivot_longer(cols = -rn, names_to = c(".value", 'grp'), 
      names_sep = '_', values_drop_na = TRUE) %>% 
  select(-grp, -rn) %>% 
  rename_all(~ str_c(., '_new')) %>% bind_cols(df, .)
#   ABC1_old ABC2_old ABC1_adj ABC2_adj ABC1_new ABC2_new
#1         1        4       NA       NA        1        4
#2         5        2       NA       NA        5        2
#3         3        1       NA       NA        3        1
#4         4        1       NA       NA        4        1
#5         3        5       NA       NA        3        5
#6        NA       NA        5        3        5        3
#7        NA       NA        5        2        5        2
#8        NA       NA        1        1        1        1
#9        NA       NA        2        4        2        4
#10       NA       NA        4        2        4        2

Or using dplyr

df %>%
   mutate(across(ends_with('old'),
    ~ coalesce(., get(str_replace(cur_column(), 
     'old', 'adj'))), .names = '{.col}_new'))

score 1 · Answer 3 · edited May 28 '21 at 08:23

I have a package on github to solve this and similar problems. In this case we could use dplyover::across2 to apply one (or more) functions to two set of columns, which can be selected with tidyselect. In the .names argument we can specify "{pre}" to refer to the common prefix of both sets of columns.

library(dplyr)
library(dplyover) # https://github.com/TimTeaFan/dplyover

df %>% 
  mutate(across2(ends_with("_old"),
                 ends_with("_adj"),
                 ~ coalesce(.x, .y),
                 .names = "{pre}_new"))

#>    ABC1_old ABC2_old ABC1_adj ABC2_adj ABC1_new ABC2_new
#> 1         1        4       NA       NA        1        4
#> 2         5        2       NA       NA        5        2
#> 3         3        1       NA       NA        3        1
#> 4         4        1       NA       NA        4        1
#> 5         3        5       NA       NA        3        5
#> 6        NA       NA        5        3        5        3
#> 7        NA       NA        5        2        5        2
#> 8        NA       NA        1        1        1        1
#> 9        NA       NA        2        4        2        4
#> 10       NA       NA        4        2        4        2

^{Created on 2021-05-16 by the reprex package (v0.3.0)}

Oh the idea behind `across2` sounds interesting. For a moment I couldn't understand how I never noticed that before till I realized you are the author. Congratulations I will check it out. — Anoushiravan R, May 16 '21 at 21:05

How best to create a new column for each two-column comparison using purrr?

3 Answers3