1

Say I have the following dataframe:

ABC1_old <- c(1, 5, 3, 4, 3, NA, NA, NA, NA, NA)
ABC2_old <- c(4, 2, 1, 1, 5, NA, NA, NA, NA, NA)
ABC1_adj <- c(NA, NA, NA, NA, NA, 5, 5, 1, 2, 4)
ABC2_adj <- c(NA, NA, NA, NA, NA, 3, 2, 1, 4, 2)

df <- data.frame(ABC1_old, ABC2_old, ABC1_adj, ABC2_adj)

I want to create a column that compares each pair of ABCn_old with its corresponding ABCn_adj. (So ABC1_old would be compared against ABCn_adj, etc.) The resulting column would be called ABCn_new. The evaluation would be that if ABCn_old is NA, fill in the blank with the corresponding value in ABCn_adj, otherwise use ABCn_old's value. The new columns would look like this:

df$ABC1_new <- c(1, 5, 3, 4, 3, 5, 5, 1, 2, 4)
df$ABC2_new <- c(4, 2, 1, 1, 5, 3, 2, 1, 4, 2)

I know a simple mutate could work here, but I would like to use some kind of tidyverse looping via purrr if possible since the dataset is much larger in reality. Any ideas for the best way to achieve this?

Natasha R.
  • 521
  • 1
  • 5
  • 11

3 Answers3

3
map_dfc(split.default(df, str_remove(names(df), "_.*")), ~coalesce(!!!.x))
# A tibble: 10 x 2
    ABC1  ABC2
   <dbl> <dbl>
 1     1     4
 2     5     2
 3     3     1
 4     4     1
 5     3     5
 6     5     3
 7     5     2
 8     1     1
 9     2     4
10     4     2

Putting it together:

df %>%
   split.default(str_replace(names(.), "_.*", "_new")) %>%
   map_dfc(~coalesce(!!!.x))%>%
   cbind(df, .)
   ABC1_old ABC2_old ABC1_adj ABC2_adj ABC1_new ABC2_new
1         1        4       NA       NA        1        4
2         5        2       NA       NA        5        2
3         3        1       NA       NA        3        1
4         4        1       NA       NA        4        1
5         3        5       NA       NA        3        5
6        NA       NA        5        3        5        3
7        NA       NA        5        2        5        2
8        NA       NA        1        1        1        1
9        NA       NA        2        4        2        4
10       NA       NA        4        2        4        2
Onyambu
  • 67,392
  • 3
  • 24
  • 53
1

Using tidyverse

library(dplyr)
library(tidyr)
library(stringr)
df %>% 
   mutate(rn = row_number()) %>%
  pivot_longer(cols = -rn, names_to = c(".value", 'grp'), 
      names_sep = '_', values_drop_na = TRUE) %>% 
  select(-grp, -rn) %>% 
  rename_all(~ str_c(., '_new')) %>% bind_cols(df, .)
#   ABC1_old ABC2_old ABC1_adj ABC2_adj ABC1_new ABC2_new
#1         1        4       NA       NA        1        4
#2         5        2       NA       NA        5        2
#3         3        1       NA       NA        3        1
#4         4        1       NA       NA        4        1
#5         3        5       NA       NA        3        5
#6        NA       NA        5        3        5        3
#7        NA       NA        5        2        5        2
#8        NA       NA        1        1        1        1
#9        NA       NA        2        4        2        4
#10       NA       NA        4        2        4        2

Or using dplyr

df %>%
   mutate(across(ends_with('old'),
    ~ coalesce(., get(str_replace(cur_column(), 
     'old', 'adj'))), .names = '{.col}_new')) 
akrun
  • 874,273
  • 37
  • 540
  • 662
1

I have a package on github to solve this and similar problems. In this case we could use dplyover::across2 to apply one (or more) functions to two set of columns, which can be selected with tidyselect. In the .names argument we can specify "{pre}" to refer to the common prefix of both sets of columns.

library(dplyr)
library(dplyover) # https://github.com/TimTeaFan/dplyover

df %>% 
  mutate(across2(ends_with("_old"),
                 ends_with("_adj"),
                 ~ coalesce(.x, .y),
                 .names = "{pre}_new"))

#>    ABC1_old ABC2_old ABC1_adj ABC2_adj ABC1_new ABC2_new
#> 1         1        4       NA       NA        1        4
#> 2         5        2       NA       NA        5        2
#> 3         3        1       NA       NA        3        1
#> 4         4        1       NA       NA        4        1
#> 5         3        5       NA       NA        3        5
#> 6        NA       NA        5        3        5        3
#> 7        NA       NA        5        2        5        2
#> 8        NA       NA        1        1        1        1
#> 9        NA       NA        2        4        2        4
#> 10       NA       NA        4        2        4        2

Created on 2021-05-16 by the reprex package (v0.3.0)

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
TimTeaFan
  • 17,549
  • 4
  • 18
  • 39
  • 1
    Oh the idea behind `across2` sounds interesting. For a moment I couldn't understand how I never noticed that before till I realized you are the author. Congratulations I will check it out. – Anoushiravan R May 16 '21 at 21:05