1

I'm trying to work through how fuzzywuzzy calculates this simple fuzz ratio:

print(fuzz.ratio("66155347", "12026599"))
25

Why is the fuzz ratio not 0 since they are completely different characters in every position?

The Levenshtein Distance = 8 (because every value needs to be substituted) a is 8 (length of string 1 is 8) b is 8 (length of string 2 is 8)

fuzz.ratio is (a+b - Levenshtein Distance)/(a+b)

fuzz.ratio is (8+8 - 8)/(8+8) = .50

fuzz.ratio is 50

There also must be something wrong with my math; I'm getting 50.

How does the fuzz ratio arrive at 25?

Any guidance would be appreciated.

Thanks

  • The [source code](https://github.com/ztane/python-Levenshtein/blob/811c050ab71593879804a61347352764837d000f/Levenshtein/_levenshtein.c#L760) for `ratio()` is available if you want to see for yourself what's calculating the ratio. The fuzzywuzzy library just multiplies the result by 100 according to its source code. – Random Davis Oct 07 '20 at 20:25

1 Answers1

4

The fuzzywuzzy library uses a weighted version of the Levenshtein distance which gives a weight of 2 to replacements, which brings the Levenshtein distance up to 12. Then (8 + 8 - 12) / (8 + 8) = 0.25.

Johannes Riecken
  • 2,301
  • 16
  • 17
  • Thanks for the reply. So if all characters are being replaced, how is that not 16 instead of 12 since there are 8 characters? – nopaynenogain Oct 07 '20 at 20:33
  • 1
    Because a smaller Levenshtein distance can be achieved by inserting and deleting less than all the characters. Both strings contain a "6" and somewhere after it a "5" for example. If both strings contained completely unique characters like "01234567" and "abcdefgh", then the fuzz ratio would indeed be 0. – Johannes Riecken Oct 07 '20 at 20:51
  • Thanks for the additional context! I just tested this out with different numbers and got 0. – nopaynenogain Oct 07 '20 at 20:55