-1

I'm trying to compare 2 lists and get a distance ratio for each item on the list. My code below returned an attribute error: 'Series' object has no attribute 'fuzz'. How do i fix this?

'differences' is a result from my earlier code for a list of companies with actual comparison (exact match) and df['Company'] is a column in my dataframe i'm trying to compare with.

from fuzzywuzzy import fuzz
from fuzzywuzzy import process
str1 = ['differences']
str2 = df['Company']
print ("distance {} -> {}: {}".format(str1,str2.fuzz.ratio(str1,str2)))
maxbachmann
  • 2,862
  • 1
  • 11
  • 35
ellaw
  • 19
  • 1

2 Answers2

0
str1 = ['differences']
str2 = ['abcd','differ']
for x in str1:
    for y in str2:
          print ("distance {} -> {}: {}".format(x,y,fuzz.ratio(x,y)))

enter image description here

Replace str2 with df['Company']

Mehul Gupta
  • 1,829
  • 3
  • 17
  • 33
  • just tried this and it doesn't work still. Just to clarify, ['differences'] and df['Company'] are a list of items and i'm trying to compare item A in 'differences' list with item A in the column df['Company'] and looping it for B,C,D....Z. Hope this clarifies my original question – ellaw Jun 08 '20 at 06:29
  • ['differences'] is a list of 1 element or is differences some other list you have put in inverted commas? Also, can you show me the error/output after running this code block – Mehul Gupta Jun 08 '20 at 06:31
  • 'differences' is a list of 60 elements. ValueError: Lengths must match to compare – ellaw Jun 08 '20 at 06:52
  • so do you wish to match each element of differences to each element of df['Company']. Hence if differences has 60 elements & df['company'] has 60 elements, than a total of 3600 comparisons? or you want element wise matching. Like 1st element of differences to match with 1st element of df['company'] producing 60 results – Mehul Gupta Jun 08 '20 at 06:54
  • Ah i see the problem with what i'm trying to do now. Both lists are not arranged in any order. 'differences' is the shorter list and df['company] is the master list. I'd like each element in 'differences' to be compared with the closest matching element in df'[company] and obtain a ratio from it. This was i can filter out more close matches. I'm after the difference (ie element in difference list not in df['companies'] list – ellaw Jun 08 '20 at 07:00
0

According to your comments it appears you would like to iterate over a list and would like to find the closest match in a pandas Series. This answer is using RapidFuzz, since it is faster than fuzzywuzzy, but would work pretty much the same way with fuzzywuzzy. To find the closest match in an iterable you can use process.extractOne, which will return a tuple (match, score) for a normal list, or a tuple (match, score, key) for objects that provide a .items() functions like e.g. a dict or a pandas.Series.

from rapidfuzz import process, fuzz

short_list = ['differences']
companies = df['Company']
for x in short_list:
  match = process.extractOne(x, companies, scorer=fuzz.ratio, processor=None)
  print("best match for {} is {} with a score of {} at the index {}"
    .format(x, match[0], match[1], match[2]))

maxbachmann
  • 2,862
  • 1
  • 11
  • 35