-3

I understand how basic fuzzy-wuzzy and its scores work. However, I came across a scenario where fuzzy-wuzzy gives a high score for WRatio even though the two strings do not seem to have similarities of any sort. (Image below for reference).

Can anyone please explain and help me understand why does it result in such behavior?

Output for reference

Fahim Uz Zaman
  • 444
  • 3
  • 6
Shreyesh Desai
  • 569
  • 4
  • 19

1 Answers1

0

In your case the two strings:

"The Boston Globe's Fresh Start program embraces the right to be forgotten"
"Subscribe to Continue Reading"

have a length difference of over 50%, so WRatio does use partial versions of most algorithms and weights them a bit lower. For the two example strings fuzz.partial_token_set_ratio returns a score of 100, since both sentences include the word to. This score is then weighted with 0.95, similar to token_set_ratio and afterwards with 0.9, since it is the partial version. Your endscore is 100 * 0.95 * 0.9 = 85.5 -> round(85.5) = 86.

maxbachmann
  • 2,862
  • 1
  • 11
  • 35