0

Reproducable example:

library(fuzzyjoin)
library(stringr)

df1 <- data.frame(x = c("Victoria Park Ave N & Pachino Blvd, Toronto, ON",
                        "The West Mall S & The Queensway, Toronto, ON",
                        "Willowdale Ave NS & Athabaska Ave, Toronto, ON"), y = c(1:3))
df2 <- data.frame(x = c("Victoria Park Ave / Pachino Blvd",
                        "Athabaska Ave / Willowdale Ave", 
                        "The Queensway / The West Mall",
                        "Dundas/ Younge"
                        ), z = c(66:69))

fuzzyjoin <- df2 %>%
  fuzzy_left_join(., df1, by = "x",  match_fun = str_detect )

expected output

enter image description here

I appreciate any suggestion

Dinesh
  • 391
  • 2
  • 9
  • This seems more of a "string distance" thing than if they share letters. Look at [`stringdist`](https://cran.r-project.org/web/packages/stringdist/index.html). – r2evans Dec 16 '21 at 19:56
  • Thanks @r2evans .I tried with `stringdist_inner_join` and `max_dist = 2`, but somehow didn't work out. – Dinesh Dec 16 '21 at 20:05
  • I suggest you look at the raw distance between each of the strings before you stop using it, I suggest `mas_dist=2` is too low. For instance, the first string from each vector has a distance of 16 (`stringdist::stringdist("Victoria Park Ave N & Pachino Blvd, Toronto, ON", "Victoria Park Ave / Pachino Blvd")`). – r2evans Dec 16 '21 at 20:07
  • For a complete look at the comparisons done with these two vectors, try `outer(df1$x, df2$x, stringdist::stringdist)` and see what the distances are; in this case, ranging from 16 to 43. – r2evans Dec 16 '21 at 20:08

0 Answers0