I want to harmonise suppliers who have different spellings, see in Supplier_Name. I already changed some things and now I am at the point Refined.
I want to change the values in Refined with the best matches in Supplier_Name_Grouped by using a Fuzz Razio.
I've used an example on the internet and now I am at this point:
SMG1 = list(df["Refined"])
RF1 = list(df.Supplier_Name_Grouped.unique())
def match_names(name, list_names, min_score=0):
max_score = -1
max_name = ''
for x in list_names:
score = fuzz.ratio(name, x)
if (score > min_score) & (score > max_score):
max_name = x
max_score = score
return (max_name, max_score)
names = []
for x in SMG1:
match = match_names(x, RF1, 05)
if match[1] >= 75:
name = ('('+str(x), str(match[0])+')')
names.append(name)
names
This Code compares my refined Suppliers to see if they are matching with one of the harmonized suppliers. The output is a List:
I want to replace now the values in my data frame (like in Excel) in the column (Redefined) with these matches.