Pandas and Fuzzy Match

Question

Currently I have two data frames. I am trying to get a fuzzy match of client names using fuzzywuzzy's process.extractOne function. When I have run the following script on sample data I get good results and no error, but when I run the following on my current data frames I get both an Attribute and Type error. I am not able to provide the data for security reasons, but if anyone can figure out why I am getting errors based on the script provided I would be much obliged.

names2 = list(dftr3['Common Name'])
names3 = dict(zip(names2,names2))
def get_fuzz_match(row):
            match = process.extractOne(row['CLIENT_NAME'],choices = n3.keys(),score_cutoff = 80)
            if match:
                return n3[match[0]]
            return np.nan    
 dfmi4['Match Name'] = dfmi4.apply(get_fuzz_match, axis=1)

I know not having some examples makes this more difficult to troubleshoot, so I will answer any question and edit the post to help this process along. The specific errors are:

1.AttributeError: 'dict_keys' object has no attribute 'items'

2.TypeError: expected string or buffer

score 1 · Answer 1 · answered Jan 09 '16 at 10:01

The AttributeError is straightforward and to be expected, I think. Fuzzywuzzy's process.extract function, which does most of the actual work in process.extractOne, uses a try:... except: clause to determine whether to process the choices parameter as dict-like or list-like. I think you are seeing the exception because the TypeError is raised during the except: clause.

The TypeError is trickier to pin down, but I suspect it occurs somewhere in the StringProcessor class, used in the processor module, again called by extract, which uses several string methods and doesn't catch exceptions. So it seems likely that your apply call is passing something that is not a string. Is it possible that you have any empty cells?

Pandas and Fuzzy Match

1 Answers1