Here is my dataframe:
df = pd.DataFrame(
dict(Name=['Emma Howard', 'Emma Ward', 'Emma Warner', 'Emma Wayden'],
Age=[33, 34, 43, 44], Score=[90, 95, 93, 92])
)
list2 = df['Name'].tolist()
I am applying fuzzywuzzy process:
process.extractBests(i, list2, score_cutoff=80, scorer=fuzz.ratio)
to extract the best matches on the column Name and it is giving the result as below:
The logic is the "Emma Howard" and "Emma Ward" are already matched in the first row, hence I do not want to show "Emma Howard" in the second row matches and same for the 3rd and fourth rows.
Here is the complete pseudo code:
mat1 = []
list1 = df['Name'].tolist()
list2 = df['Name'].tolist()
list3 = df['Name'].tolist()
for i in list1:
list2 = [x for x in list2 if x != i]
mat1.append(process.extractBests(i, list2, score_cutoff=80, scorer=fuzz.ratio))
list2 = list3
df['matches'] = mat1
df.to_csv("xyz.csv")