1

I have two values which their order is mismatched and values are ideally same.
When i calculate the string similaratity the score between them is far away from its ideal score

col_1 = c("USA,UK,APAC")
col_2 = c("UK,APAC,USA")
library(stringdist)
stringdist(col_1,col_2,  method = 'jw')

How to identify that both col_1 and col_2 are similar even their order is miss-arranged.i.e is there any method to identify that both the values are ideally same

san1
  • 455
  • 2
  • 11

1 Answers1

1

If your values are consistently separated by commas, you could split on the comma, sort and then compare them.

spl_1 <- sort(unlist(strsplit(col_1, ',')))
spl_2 <- sort(unlist(strsplit(col_2, ',')))

stringdist(spl_1, spl_2, method = "jw")

Kelsey
  • 81
  • 3