I'm using fuzzyjoin
to cross politicians and their respective regions:
library(dplyr)
library(fuzzyjoin)
x <- tibble(name = c("Fulvio Rossi Ciocca", "Rigoberto Del Carmen Rojas Sarapura", "Lorena Vergara Bravo", "Lily Perez San Martin"),
activity = c("surgeon", "business", "public administration", "publicist"))
y <- tibble(name = c("Rossi Ciocca Fulvio", "Perez San Martin Lily"), region = c(1,5))
z <- x %>%
stringdist_inner_join(y, max_dist = 10)
On my example "Fulvio Rossi Ciocca" and "Rossi Ciocca Fulvio" are the same person. In fact, all the data in my datasets contains the same people but with variations like "Lennon John" instead of "John Lennon".
I did look fuzzyjoin
documentation but I don't find a way to write a working version of this pseudo-code:
x %>%
fuzzy_join(y, mode = "left", match_fun = "A ~ permutations(A)")