0

I am trying to join two tables based on a code created within each table that identifies a prescribed drug. The problem is that the drug code sometimes has additional numbers at the end in one table. See ex:

  • ncdnum table 1: 4988472401
  • ncdnum table 2: 49884724

is there a way to join these two tables so that table 1 is joined by table 2 with that base ndcnum code.

i tried a left join but since none of the values equal each other nothing was joined.

TopNDCCodes <- left_join(ndcnum, 
                         ndccodes, 
                         by = c("ndcnum" = "new_NDC_Product")) 

enter image description here enter image description here

Till
  • 3,845
  • 1
  • 11
  • 18
  • 1
    I presume there's a metric from `stringdist` which would help identify the number of contiguous matches, which you could use in the`fuzzyjoin` package to perform this join. Or if the base string is a predictable length, you could extract that and join with that instead. – Jon Spring Jul 31 '23 at 17:01
  • 1
    @JonSpring `fuzzyjoin` also has `fuzzyjoin::regex_*` join functions that could be useful here. E.g., match if ncdnum table 1 starts with the same sequence as ncdnum table 2. – LMc Jul 31 '23 at 17:31

1 Answers1

0

If you are certain that the first 8 digits of ndcnum always uniquely identify a row, the following should work:

ndcnum_mod <- 
  ndcnum |> 
  mutate(ndcnum = substring(ndcnum, 1, 8))

TopNDCCodes <- left_join(ndcnum_mod, 
                         ndccodes, 
                         by = c("ndcnum" = "new_NDC_Product")) 
Till
  • 3,845
  • 1
  • 11
  • 18
  • Thanks Till, unfortunately the codes can be different lengths. they are only unique to the matching code in the table, which adds additional numbers to describe how much of the drug was prescribed. looking into the other comments in the mean time. – sebastian.mendoza Jul 31 '23 at 22:11