0

I am trying to use the fuzzywuzzy library to get similarity score between strings in 2 datasets using the fuzz.ratio function.

Although I am constantly getting the following error :

 File "title_matching.py", line 29, in <module>
    match = match_title(title, all_titles_list, 75)
  File "title_matching.py", line 12, in match_title
    score = fuzz.ratio(title, title2)
  File "/usr/local/lib/python3.7/site-packages/fuzzywuzzy/utils.py", line 38, in decorator
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/fuzzywuzzy/utils.py", line 29, in decorator
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/fuzzywuzzy/utils.py", line 45, in decorator
    if len(args[0]) == 0 or len(args[1]) == 0:
TypeError: object of type 'float' has no len()

Below is the module where I am using the library function :

def match_title(title, list_titles, min_score=0):
    # -1 score incase we don't get any matches
    max_score = -1
    # Returning empty name for no match as well
    max_name = ""
    # Iternating over all names in the other
    for title2 in list_titles:
        #Finding fuzzy match score
        score = fuzz.ratio(title, title2)
        # Checking if we are above our threshold and have a better score
        if (score > min_score) & (score > max_score):
            max_name = title2
            max_score = score
    return (max_name, max_score)

I have checked the values of title & list_titles by printing them and they are string and list of strings respectively. I have no idea why this is happening or how to fix it since the error is being generated in the library file.

iammrmehul
  • 730
  • 1
  • 14
  • 35

1 Answers1

1

score = fuzz.ratio(title, title2)

Either title or title2 is a float and not a string.

from fuzzywuzzy import fuzz

print(fuzz.ratio('1', '2'))
# 0
print(fuzz.ratio(1.0, '2'))
  Traceback (most recent call last):
  File "main.py", line 3, in <module>
    print(fuzz.ratio(1.0, '2'))
  File "C:\Python37\lib\site-packages\fuzzywuzzy\utils.py", line 38, in decorator
    return func(*args, **kwargs)
  File "C:\Python37\lib\site-packages\fuzzywuzzy\utils.py", line 29, in decorator
    return func(*args, **kwargs)
  File "C:\Python37\lib\site-packages\fuzzywuzzy\utils.py", line 45, in decorator
    if len(args[0]) == 0 or len(args[1]) == 0:
TypeError: object of type 'float' has no len()
DeepSpace
  • 78,697
  • 11
  • 109
  • 154
  • Thanks, after extensive logging, I discovered there were missing values in the datasets which I did not know since the dataset is extremely huge. But for some reason whenever there was a null value python was reading it as float which confused me. – iammrmehul Dec 05 '18 at 12:08
  • @iammrmehul `np.nan` is a float for multiple historical and practical reasons. https://stackoverflow.com/questions/35323032/numpy-does-treat-floatnan-and-float-differently-convert-to-none – DeepSpace Dec 05 '18 at 12:11