Hi I am trying to match one string from other string in different dataframe and get nearest n matches based on score.
EX: from string_2 (df_2) column i need to match with string_1(df_1) and get the nearest 3 matches based on each ID group.
ID = c(100, 100,100,100,103,103,103,103,104,104,104,104)
string_1 = c("Jack Daniel","Jac","JackDan","steve","Mark","Dukes","Allan","Duke","Puma Nike","Puma","Nike","Addidas")
df_1 = data.frame(ID,string_1)
ID = c(100, 100, 185, 103,103, 104, 104,104)
string_2 = c("Jack Daniel","Mark","Order","Steve","Mark 2","Nike","Addidas","Reebok")
df_2 = data.frame(ID,string_2)
My output dataframe df_out will look like below.
ID = c(100, 100,185,103,103,104,104,104)
string_2 = c("Jack Daniel","Mark","Order","Steve","Mark 2","Nike","Addidas","Reebok")
nearest_str_match_1 = c("Jack Daniel","JackDan","NA","Duke","Mark","Nike","Addidas","Nike")
nearest_str_match_2 =c("JackDan","Jack Daniel","NA","Dukes","Duke","Addidas","Nike","Puma Nike")
nearest_str_match_3 =c("Jac","Jac","NA","Allan","Allan","Puma","Puma","Addidas")
df_out = data.frame(ID,string_2,nearest_str_match_1,nearest_str_match_2,nearest_str_match_3)
i have tried manually with package "stringdist" - 'jw' method and get the nearest value.
stringdist::stringdist("Jack Daniel","Jack Daniel","jw")
stringdist::stringdist("Jack Daniel","Jac","jw")
stringdist::stringdist("Jack Daniel","JackDan","jw")
Thanks in advance