I am looking for some suggestions on the algorithms which could be used for string matching which also supports non-english languages too.
Previously tried algorithm:
I have tried Levenshtein distance (Fuzzy matching) with token_sort_ratio algorithm. This algorithm works pretty well for most of my uses case and even for non-english languages. I considered two strings to be a match if the ratio is above 90%. The problem I am currently facing with this algorithm is, in the below example, 19th century and 18th century are not the same and I do not want them to be considered as a match.
Str1 = "19th Century"
Str2 = "18th Century"
fuzz.token_sort_ratio(Str1,Str2)
>> 92%
If I change the ratio to be greater than 95% then, I would miss below example as a match. But these two strings are a match
Str1 = "Robert Jones"
Str2 = "Robert F. Jones"
fuzz.token_sort_ratio(Str1,Str2)
>> 92%