Assign Id to fuzzy match name in new table - R

Question

I have two tables. Table one has an id column and a full_name column. Table two has only a full name column but the names are near-matches and not full matches. I would like to apply the id column to the second table so that the ids apply to the correct near-match names. My tables look like this:

df1 <- read.table(text="
id  full_name
1   'Tom Jones'
2   'Jim James'", header = TRUE)

df2 <- read.table(text="
full_name
'Tom Jones Jr.'
'Jim James Ii'", header = TRUE)

I would like Table 2 to end up like this:

id	full_name
1	Tom Jones Jr.
2	Jim James Ii

I have tried stringdist_join and would like to ideally do in in dplyr but I'm not sure thats possible. Any help would be appreciated. Thanks.

Hi Ben! Welcome to StackOverflow. `stringdist_left_join(df1, df2, by = "full_name", method = "jw", max_dist = 0.3)` works! But in my experience, the fuzzyjoin package functions are extremely temperamental, and fail to work on more than a couple of rows. So YMMV — Mark, Aug 22 '23 at 13:57
Mark! Thank you! This works! I'm trying to keep my code clean. How would I do this in dplyr mutating df2 and only transfer over the pid instead of all the info from df1. I know I can just do it outside of dplyr and delete un-needed rows but it would be great if I could leave it as clean as possible, if possible. Thank you! — Ben Blackburn, Aug 22 '23 at 14:42
I think you can then just select the relevant columns Ben! I would post an answer but this isn't my area of expertise, and there are a half dozen different methods of joining in stringdist_left_join, and I honestly couldn't tell you which one is best! — Mark, Aug 22 '23 at 14:44
`stringdist_left_join(df2, select(df1, full_name, id), by = "full_name", method = "jw", max_dist = 0.3)` would take each row of `df2` and link it to all matching full_name+id from `df1`, without bringing over any other columns. — Jon Spring, Aug 22 '23 at 16:56

Assign Id to fuzzy match name in new table - R

0 Answers0