in my simplified example I have a dataframe with four different columns. I want to be able to match main_name and main_dob together with secondary_name and secondary_dob. The actual order of the rows doesn't matter, so if there is a match in row 3 and row 4, I would want them to return the same value and show that there is a match there.
Below is my sample data.
main_name <- c("Arthur Lee", "Robert Frost", "Sarah Doe", "Elizabeth Smith")
main_dob <- c("3/3/93", "10/21/70", "11/25/88", "4/2/92")
secondary_name <- c("David Lee", "Robert L. Frost", "Elizabeth Smith", "Mark Roger")
secondary_dob <- c("4/4/95", "10/21/70", "4/2/92", "11/25/88")
df <- data.frame(main_name,main_dob,secondary_name,secondary_dob)
I would want the output to show me that Arthur Lee's closest match is David Lee, and the distance between the two, as well as the distance between their birthdays. Following, I would want to see that Robert Frost's match exists, but the distance is a little off since the secondary_name contains his middle name, but the birthday helps me verify it's the same person. Next, there is no Sarah Doe, so I would show whatever is the closest distance match and closest birthday distance. Lastly, I would get Elizabeth Smith to match with Elizabeth Smith even though they are on different rows in the two data.
I am thinking of using the jaro-winkler (jw) package for distance, but am open to any ideas and help.