I have 1 DataFrame contain 2 columns of string data. i need to compare columns 'NameTest'and'Name'. and i want each name in columns'NameTest' compare too all name in columns 'Name'. and if they matching more than 80% print closest match name.
*My dataframe
NameTest | Name | |
---|---|---|
0 | john carry | john carrt |
1 | alex midlane | john crat |
2 | robert patt | alex mid |
3 | david baker | alex |
4 | NaN | patt |
5 | NaN | robert |
6 | NaN | david baker |
My Code
from fuzzywuzzy import fuzz, process
import pandas as pd
import numpy as np
import difflib
cols = ["Name", "NameTest"]
df = pd.read_excel(
r'D:\FFOutput\name.xlsx', usecols=cols,) # Read Excel
for i, row in df.iterrows():
na = row.Name
ne = row.NameTest
print([ne, na])
for i in na:
c = difflib.SequenceMatcher(isjunk=None, a=ne, b=na)
diff = c.ratio()*100
diff = round(diff, 1)
if diff >= 80:
print(na, diff)
Any suggestions?
Thank you for your help