how to partially match between 2 columns in R

Question

I'm having trouble finding out if values partially match between 2 columns in different tables so i could update values accordingly. The problem is trying to match a longer value with smaller value and having multiple prefix values. I don't want to use any loops as my dataset is in the 1m range.

df1:

PRODUCT_8	VAT
72093234	0,2
30023000	0,2
12093236	0,2
30066012	0,2

df2:

KN
30022000
30023000
300660
72

Desired output:

PRODUCT_8	VAT
72093234	0,9
30023000	0,9
12093236	0,2
30066012	0,9

I have tried this:

df1[is.na(pmatch(df1$PRODUCT_8, df2$KN, duplicates.ok = TRUE)) == FALSE, 
                 "VAT"] <- 0.09

but pmatch doesn't understand if I want to match longer numbers with smaller. It would be perfect if I could just use startswith(df1$PRODUCT_8, df2$KN), but sadly you can't have multiple values in the prefix argument. sqldf is not complying with my demands either, don't know what's the problem with that.

You can use `str_detect()` function from `stringr` package for example. `str_detect(string = df1$PRODUCT_8, pattern = paste(df2$KN, collapse = "|"))` gives you logical result TRUE/FALSE. In a tidy form, you can create a new column based on `str_detect`, then update VAT column, as: `df1 %>% mutate(is_matching = str_detect(string = PRODUCT_8, pattern = paste(df2$KN, collapse = "|"))) %>% mutate(VAT = if_else(condition = is_matching == TRUE, 0.9, VAT))` — JGr, May 15 '23 at 14:55
Looks like an ordinary merge, although the data and the answer do not look compatible. — IRTFM, May 15 '23 at 14:59
I thought the goal was to find out whether or not Product_8 and KN columns match, even if only partially, and then to update VAT. But I maybe misunderstood the question... and it is quite probable that the same thing can be done in a much more concise way! :) — JGr, May 15 '23 at 15:09
@JGr that's exactly what i want. i'm gonna try out your suggestion, ty — Redesfist, May 16 '23 at 05:17

how to partially match between 2 columns in R

0 Answers0