0

I'm having trouble finding out if values partially match between 2 columns in different tables so i could update values accordingly. The problem is trying to match a longer value with smaller value and having multiple prefix values. I don't want to use any loops as my dataset is in the 1m range.

df1:

PRODUCT_8 VAT
72093234 0,2
30023000 0,2
12093236 0,2
30066012 0,2

df2:

KN
30022000
30023000
300660
72

Desired output:

PRODUCT_8 VAT
72093234 0,9
30023000 0,9
12093236 0,2
30066012 0,9

I have tried this:

df1[is.na(pmatch(df1$PRODUCT_8, df2$KN, duplicates.ok = TRUE)) == FALSE, 
                 "VAT"] <- 0.09

but pmatch doesn't understand if I want to match longer numbers with smaller. It would be perfect if I could just use startswith(df1$PRODUCT_8, df2$KN), but sadly you can't have multiple values in the prefix argument. sqldf is not complying with my demands either, don't know what's the problem with that.

kjetil b halvorsen
  • 1,206
  • 2
  • 18
  • 28
Redesfist
  • 11
  • 4
  • You can use `str_detect()` function from `stringr` package for example. `str_detect(string = df1$PRODUCT_8, pattern = paste(df2$KN, collapse = "|"))` gives you logical result TRUE/FALSE. In a tidy form, you can create a new column based on `str_detect`, then update VAT column, as: `df1 %>% mutate(is_matching = str_detect(string = PRODUCT_8, pattern = paste(df2$KN, collapse = "|"))) %>% mutate(VAT = if_else(condition = is_matching == TRUE, 0.9, VAT))` – JGr May 15 '23 at 14:55
  • Looks like an ordinary merge, although the data and the answer do not look compatible. – IRTFM May 15 '23 at 14:59
  • I thought the goal was to find out whether or not Product_8 and KN columns match, even if only partially, and then to update VAT. But I maybe misunderstood the question... and it is quite probable that the same thing can be done in a much more concise way! :) – JGr May 15 '23 at 15:09
  • @JGr that's exactly what i want. i'm gonna try out your suggestion, ty – Redesfist May 16 '23 at 05:17

0 Answers0