1

I am trying to fuzzy match names of companies from two CSV files (each has company name in one column and the state that the company is located in in another), but want to limit the matching to be conditional on state (e.g., if a company from list A is in California, then the fuzzy match would be limited only to companies in list B that are also in California).

I've got the below working, but it puts the matched name and the score of the match into one column e.g., "('Store XYZ', 75)" - is there a way to break this into two columns and ensure I don't have quotes around the matched name?

import pandas as pd
from fuzzywuzzy import process
from fuzzywuzzy import fuzz

x = pd.read_csv(r'C:\Users\AH\Customers_1.csv')
choices = pd.read_csv(r'C:\Users\AH\Customers_2.csv')


def fuzzy_match(choices, x, scorer, cutoff):
    
        match = process.extractOne(choices['Choices_Customer_Name'], 
                                   choices=x.loc[x['A_State'] == choices['B_State'], 'X_Customer_Name'], 
                                   scorer=scorer, 
                                   score_cutoff=cutoff)
        if match:
            return match[0], match [1]
    

choices['Customer_Name_Match'] = choices.apply(fuzzy_match, args=(x, fuzz.token_sort_ratio, 70), axis = 1)



print(choices)
export_excel = choices.to_excel (r'C:\Users\AH\output.xlsx')
Alex
  • 11
  • 2

0 Answers0