I have two lists of companies (> 2k entries in the longer list) in different formats that I need to unify. I know that both formats share a stub about 80% of the time, so I'm using fuzzy match to compare both lists:
def get_fuzz_score(str1, str2):
from fuzzywuzzy import fuzz
partial_ratio = fuzz.partial_ratio(str1, str2)
return partial_ratio
a = ['Express Scripts', 'Catamaran Corp', 'Banmedica SA (96.7892%)', 'WebMD', 'ODC', 'Caremerge LLC (Stake%)']
b = ['Doctor on Demand', 'Catamaran', 'Express Scripts Holding Corp', 'ODC, Inc.', 'WebMD Health Services', 'Banmedica']
for i in b:
for j in a:
if get_fuzz_score(i, j) > 80:
# process
I'd appreciate thoughts on how to optimize this task for performance (e.g., not have to use 2 for loops).