0

I want to harmonise suppliers who have different spellings, see in Supplier_Name. I already changed some things and now I am at the point Refined.

This is the Situation

I want to change the values in Refined with the best matches in Supplier_Name_Grouped by using a Fuzz Razio.

I've used an example on the internet and now I am at this point:

SMG1 = list(df["Refined"])
RF1 = list(df.Supplier_Name_Grouped.unique())

def match_names(name, list_names, min_score=0):
    max_score = -1
    max_name = ''
    for x in list_names:
        score = fuzz.ratio(name, x)
        if (score > min_score) & (score > max_score):
            max_name = x
            max_score = score
    return (max_name, max_score)


names = []

for x in SMG1:
    match = match_names(x, RF1, 05)
    if match[1] >= 75:
        name = ('('+str(x), str(match[0])+')')
        names.append(name)


names

This Code compares my refined Suppliers to see if they are matching with one of the harmonized suppliers. The output is a List:

This works quite well

I want to replace now the values in my data frame (like in Excel) in the column (Redefined) with these matches.

0 Answers0