I'm working with fuzzy wuzzy in python and while it claims it works with a levenshtein distance, I find that many strings with a single character different produce different results. For example.
>>>fuzz.ratio("vendedor","vendedora")
94
>>>fuzz.ratio("estagiário","estagiária")
90
>>> fuzz.ratio("abcdefghijlmnopqrst","abcdefghijlmnopqrsty")
97
>>>fuzz.ratio("abc","abcd")
86
>>>fuzz.ratio("a","ab")
67
I guess levenshtein distance should be the same as there is a single character distance in all the examples, but I understand this is not simple distance, it is some sort of "equality percentage" of some sort.
I tried to understand how it works but I cannot seem to understand. My very long string gives a 97 and the very short a 67. I guess it would mean the larger the string, there is less impact on a single character. However for the "vendedor","vendedora" and "estagiário","estagiária" example, that is not the case, as the latter is larger than the former.
How does this work?
I am currently matching user input job titles, trying to connect mistyped names with correctly typed names etc. is there a better package for my task?