i am trying to compare col_1 in df_1 dataframe with col_2 in df_2 dataframe to get nearest top 3 match with least score(least score represents nearest match) and their respective rowid. Also is there any flexibility to change top N nearest matches.i.e in my case i have considered top 3 and to chane like top 5,top 10 and so on
col_1 = c("My name is john","The best ever Puma wishlist", "i have been mailing my issue daily",
"Its perfect for a day at gym")
col_2 = c("My name is jon","My Name is jhn", "My Nam is mark", "Mu Name is John",
"John is my name", "Its perfect for a day at gym&outside", "Its perfect for a outside",
"Its perfect day at gym", "Its perfect for a day at gm", "My name is john" )
row_id = c(1,2,3,4,5,6,7,8,9,10)
df_1 = data.frame(col_1)
df_2 = data.frame(col_2,row_id)
Final out df should be
col_1 = c("My name is john","The best ever Puma wishlist","i have been mailing my issue daily","Its perfect for a day at gym")
nearest_1 = c("My name is John","Its perfect for a outside","Its perfect for a day at gym&outside","Its perfect for a day at gm")
nearest_1_row_id = c(10,7,6,9)
nearest_2 = c("My name is jon","Its perfect for a day at gym&outside","John is my name","Its perfect for a day at gym&outside")
nearest_2_row_id = c(1,6,5,6)
nearest_3 = c("My Name is jhn","Its perfect day at gym","My name is john","Its perfect day at gym")
nearest_3_row_id = c(3,8,4,8)
**df_1_out = data.frame(col_1,nearest_1,nearest_1_row_id,nearest_2,nearest_2_row_id,nearest_3,nearest_3_row_id)**
I have tried with
library(stringdist)
df_1_out = df_1
df_1_out$nearest_1 = stringdist("My name is john","My name is jon", method = 'jw')
Like wise i need compare each and every row. Is there any alternative method to achive the required output.