2

I am fairly new to python, have been using fuzzywuzzy to do some fuzzy matching with success. I am wondering, however, if there is way to exclude terms from the algorithm? Generic terms can often be matched to a ton of options, and I would like to prevent the algorithm from matching on those terms without doing a lot of pre-processing. I cannot seem to find any examples / documentation.

1 Answers1

1

You could use the builtin difflib for this.

import difflib
search_list = ['ape', 'apple', 'peach', 'puppy']
matches = difflib.get_close_matches('appel', possibilities=search_list, cutoff=0.6)

print(matches)
['apple', 'ape']

exclude_list = ['ape']

matches_with_exclusion = [x for x in matches if x not in exclude_list]
print(matches_with_exclusion) 
['apple']
Matthew Borish
  • 3,016
  • 2
  • 13
  • 25
  • this is helpful in a way, but I am trying to get the most out of the fuzzy wuzzy package / Levenshtein distance method. Hoping that there is functionality in the package to exclude terms, seeing if anyone knows. – Patrick Williams Apr 10 '20 at 14:45
  • FWIW, I looked at the FW source code and don't see any exclusion type functionality like you're looking for. https://github.com/seatgeek/fuzzywuzzy/tree/master/fuzzywuzzy. Can you just use the list comprehension method in my answer to remove unwanted FW return results? – Matthew Borish Apr 10 '20 at 15:49