-1

I am trying to learn and implement fuzzy matching in python. I have two data sets which I load as data frames into pandas. Set 1 is the reference set. Set two is the set containing data to match with the reference names.

I loop through the set_1 items to search for corresponding entries in the reference, but I get an error. I need some help with the error.
Am I trying to structure the algorithm in a good way?

My attempt:

import pandas as pd
import fuzzywuzzy as fuzzy
from difflib import SequenceMatcher

set_1 = pd.read_csv("C:/Folder/file_1.csv")
set_2 = pd.read_csv("C:/Folder/file_2.csv")

query = set_1['name']
choices = set_2['name2']

for query in query:
    match = fuzzy.extractOne(query,choises=choises,scorer=scorer,score_cutoff=cutoff)

I get the following error:

AttributeError: module 'fuzzywuzzy' has no attribute 'extractOne'
Chris
  • 767
  • 1
  • 8
  • 23

1 Answers1

2

If you take a look at the package's usage on github, you'll notice that extractOne is a function defined in fuzzywuzzy.process, so you'll need to import that submodule like so:

import pandas as pd
from fuzzywuzzy import process  # <-- note the difference
from difflib import SequenceMatcher

set_1 = pd.read_csv("C:/Folder/file_1.csv")
set_2 = pd.read_csv("C:/Folder/file_2.csv")

query = set_1['name']
choices = set_2['name2']

for query in query:
    #       vvvvvvv  note the difference
    match = process.extractOne(query,choises=choises,scorer=scorer,score_cutoff=cutoff)
PaSTE
  • 4,050
  • 18
  • 26
  • I changed as suggested above, changed the loop as well to: for row in names.itertuples(): print((row[1])) match = process.extractOne(row[1],reference,partial_token_set_ratio,score_cutoff=70) and still cannot get it working. This time there is a different error : NameError: name 'partial_token_set_ratio' is not defined – Chris May 06 '18 at 23:43
  • It may be that `fuzzywuzzy` isn't very smart about its imports. Try adding `from fuzzywuzzy import fuzz` to your imports and see what happens. – PaSTE May 07 '18 at 00:13
  • it seems that the apply function is the better way than a loop, as demonstrated here http://blog.keyrus.co.uk/fuzzy_matching_101_part_ii.html – Chris May 08 '18 at 19:19