I have a dataframe with two columns which can contain literally any character of various formats and i would like to match them.
library(stringr)
library(fuzzyjoin)
x <- data.frame(idX=1:3, string=c("silver", "30BEDJE202AA", "30BEDJE2027"))
y <- data.frame(idY=letters[1:3], seed=c("sliver", "30BEDJE202ABC", "30BEDJE2027BL"))
x$string = as.character(x$string)
y$seed = as.character(y$seed)
x %>% fuzzy_left_join(y, by = c(string = "seed"), match_fun = str_detect)
Here is the result i get when running the above code:
idX string idY seed
1 1 silver <NA> <NA>
2 2 30BEDJE202AA <NA> <NA>
3 3 30BEDJE2027 <NA> <NA>
And this is what i would like to have:
idX string idY seed
1 1 silver a sliver
2 2 30BEDJE202AA b 30BEDJE202ABC
3 3 30BEDJE2027 c 30BEDJE2027BL
Is there a way to get there?