Order mismatch and similarity

Question

I have two values which their order is mismatched and values are ideally same.
When i calculate the string similaratity the score between them is far away from its ideal score

col_1 = c("USA,UK,APAC")
col_2 = c("UK,APAC,USA")
library(stringdist)
stringdist(col_1,col_2,  method = 'jw')

How to identify that both col_1 and col_2 are similar even their order is miss-arranged.i.e is there any method to identify that both the values are ideally same

score 1 · Answer 1 · answered Sep 29 '21 at 21:06

1

If your values are consistently separated by commas, you could split on the comma, sort and then compare them.

spl_1 <- sort(unlist(strsplit(col_1, ',')))
spl_2 <- sort(unlist(strsplit(col_2, ',')))

stringdist(spl_1, spl_2, method = "jw")

answered Sep 29 '21 at 21:06

Kelsey

81
3

Order mismatch and similarity

1 Answers1