Python FuzzyWuzzy ratio: how does it work?

Question

Inside the FuzzyWuzzy ratio description it says:

The FuzzyWuzzy ratio raw score is a measure of the strings similarity as an int in the range [0, 100]. For two strings X and Y, the score is defined by int(round((2.0 * M / T) * 100)) where T is the total number of characters in both strings, and M is the number of matches in the two strings. The FuzzyWuzzy ratio sim score is a float in the range [0, 1] and is obtained by dividing the raw score by 100.

Then how come this score appears to be different when I change the order of the words?

 from fuzzywuzzy import fuzz

 fuzz.ratio('EMRE MERT', 'OMER CAN') / 100 = 0.35

 fuzz.ratio('EMRE MERT', 'CAN OMER') / 100 = 0.47

score 4 · Accepted Answer · answered Jun 01 '20 at 23:18

The definition you're using comes from the Ratio function in the py_stringmatching module, but the function you're using is from the fuzzywuzzy module which uses the Levenshtein distance.

From the recursive implementation of Levenshtein you can see that the algorithm considers the strings character-by-character, and so changing the order of the characters will change the output value.

Python FuzzyWuzzy ratio: how does it work?

1 Answers1