I need to merge two datasets based on columns that contain names that don't exaclty match, sometimes because one of the columns has a missing name with respect to the other. For example, in one column I have "Martín Gallardo"
and in the other I have "Martín Ricardo Gallardo"
. Another problem is that in some first and last name appear reversed, like "Martín Gallardo"
in one and "Gallardo Martín"
in the other. How can I match this using R? My first thought was to use str_split
in both and assign each on one set to the one that matches more elements from the other set, but I'm not sure how to code this.
Thank you.
Edit: data looks something like this
A <- tibble(email=c("martingallardo23@gmail.com","raulgimenez@gmail.com"),
name=c("martin", "raul"), last_name=c("gallardo","gimenez"),
full_name=c("martin gallardo", "raul gimenez"))
A
# A tibble: 2 x 4
# email name last_name full_name
# <chr> <chr> <chr> <chr>
# 1 martingallardo23@gmail.com martin gallardo martin gallardo
# 2 raulgimenez@gmail.com raul gimenez raul gimenez
B <- tibble(email=c("martingallardo@gmail.com", "raulgimenez2@gmail.com"),
name=c("martin ricardo", "gimenez"), last_name=c("gallardo", "raul"),
full_name=c("martin ricardo gallardo", "gimenez raul"), other_data=c("A", "B"))
B
# A tibble: 2 x 5
# email name last_name full_name other_data
# <chr> <chr> <chr> <chr> <chr>
# 1 martingallardo@gmail.com martin ricardo gallardo martin ricardo gallardo A
# 2 raulgimenez2@gmail.com gimenez raul gimenez raul B