2

I have a dataframe with ICPM codes before and after recoding of an operation.

    df1 <- tibble::tribble(~ops, ~opsalt,
"8-915, 5-847.32",      "5-847.32, 5-852.f3, 8-915",
"8-915, 5-781.30, 8-919, 5-807.4, 5-800.c1, 5-79b.81", "5-79b.81, 5-800.c1, 5-805.y, 5-807.4, 8-919, 5-781.30, 8-915",
"5-786.1, 5-808.a4, 5-784.1u, 5-783.2d, 5-788.5e", "5-788.5e, 5-783.2d, 5-780.4d, 5-784.7d, 5-784.1u, 5-808.a4, 5-786.1",
"8-915, 5-784.0v, 5-788.5f, 5-788.40, 5-808.b0, 5-786.k, 5-788.60, 5-788.00, 5-786.0, 5-783.2d", "5-788.00, 5-788.60, 5-786.0, 5-786.k, 5-788.40, 5-808.b0, 5-788.5f, 5-781.ad, 5-784.0v, 8-915")

I want to calculate two columns which contains the differing codes between the two columns.
For the first row the difference between ops and opsalt would be character(0).
The difference between opsalt and ops would be 5-852.f3.

Tried:

df <–  df %>% mutate(ops = strsplit(ops,",")) %>% 
        mutate(opsalt =strsplit(opsalt,","))    
df <- df %>% rowwise() %>%  mutate(neu_alt = list(setdiff(ops,opsalt))) %>% mutate(alt_neu = list(setdiff(opsalt,ops)))

This didn't work, because I want to compare parts of the respective strings and not the whole string.

Peter Hahn
  • 148
  • 8

2 Answers2

3

It should work if you use ", " in strsplit and df1 in your first mutate call.

library(dplyr)

df1 %>%
  mutate(across(.fns = ~ strsplit(.x, ", "))) %>% 
  rowwise %>% 
  mutate(neu_alt = list(setdiff(ops, opsalt)),
         alt_neu = list(setdiff(opsalt, ops)))

#> # A tibble: 4 x 4
#> # Rowwise: 
#>   ops        opsalt     neu_alt   alt_neu  
#>   <list>     <list>     <list>    <list>   
#> 1 <chr [2]>  <chr [3]>  <chr [0]> <chr [1]>
#> 2 <chr [6]>  <chr [7]>  <chr [0]> <chr [1]>
#> 3 <chr [5]>  <chr [7]>  <chr [0]> <chr [2]>
#> 4 <chr [10]> <chr [10]> <chr [1]> <chr [1]>

Created on 2022-01-04 by the reprex package (v0.3.0)

TimTeaFan
  • 17,549
  • 4
  • 18
  • 39
2

If you want to keep them as strings, you can try this method. If you intend to do similar ops repeatedly, then I suggest retaining the list-columns (instead of repeatedly strspliting them).

df1 %>%
  mutate(
    d = mapply(function(...) toString(setdiff(...)),
               strsplit(ops, "[ ,]+"), strsplit(opsalt, "[ ,]+"))
  )
# # A tibble: 4 x 3
#   ops                                                                                           opsalt                                                                                        d         
#   <chr>                                                                                         <chr>                                                                                         <chr>     
# 1 8-915, 5-847.32                                                                               5-847.32, 5-852.f3, 8-915                                                                     ""        
# 2 8-915, 5-781.30, 8-919, 5-807.4, 5-800.c1, 5-79b.81                                           5-79b.81, 5-800.c1, 5-805.y, 5-807.4, 8-919, 5-781.30, 8-915                                  ""        
# 3 5-786.1, 5-808.a4, 5-784.1u, 5-783.2d, 5-788.5e                                               5-788.5e, 5-783.2d, 5-780.4d, 5-784.7d, 5-784.1u, 5-808.a4, 5-786.1                           ""        
# 4 8-915, 5-784.0v, 5-788.5f, 5-788.40, 5-808.b0, 5-786.k, 5-788.60, 5-788.00, 5-786.0, 5-783.2d 5-788.00, 5-788.60, 5-786.0, 5-786.k, 5-788.40, 5-808.b0, 5-788.5f, 5-781.ad, 5-784.0v, 8-915 "5-783.2d"

(I recommend using list-columns, though, as demonstrated in TimTeaFan's answer.)

r2evans
  • 141,215
  • 6
  • 77
  • 149