String matching algorithms in python

Question

I am looking for some suggestions on the algorithms which could be used for string matching which also supports non-english languages too.

Previously tried algorithm:

I have tried Levenshtein distance (Fuzzy matching) with token_sort_ratio algorithm. This algorithm works pretty well for most of my uses case and even for non-english languages. I considered two strings to be a match if the ratio is above 90%. The problem I am currently facing with this algorithm is, in the below example, 19th century and 18th century are not the same and I do not want them to be considered as a match.

Str1 = "19th Century"
Str2 = "18th Century"
fuzz.token_sort_ratio(Str1,Str2)
>> 92%

If I change the ratio to be greater than 95% then, I would miss below example as a match. But these two strings are a match

Str1 = "Robert Jones"
Str2 = "Robert F. Jones"
fuzz.token_sort_ratio(Str1,Str2)
>> 92%

score 0 · Answer 1 · answered Sep 01 '20 at 23:05

0

why not try to use a range function instead. you could just write the range of the percent you want and loop it. The code might take a while but it should work.

answered Sep 01 '20 at 23:05

oluwaferanmi Fakolujo

5
2

String matching algorithms in python

1 Answers1