I have a list of company names which are not properly aligned. Data set looks like
df[Name]= [Google, google, Google.inc, Google Inc., Google.com]
I have about 500,000 rows and name should be corrected with best way possible.
My code looks like below:
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
import pandas as pd
get_match = []
for row in df.index:
name1= df.get_value(row,"Name")
for columns in df2.index:
name2=df2.get_value(columns,"Name")
matched_token=[process.extract(x, name2, limit=3) for x in name1]
get_match.append([matched_token, name1, name2])
df_maneet = pd.DataFrame({'Ratio': [i[0] for i in get_match], 'name1': [i[1] for i in get_match], 'name2':[i[2] for i in get_match]})
My result in matched_token is
[[('google', 100, 0), ('Sxyzdgg.', 48, 9), ('ggigsk', 45, 2)]]
but I want to append result in df and see result like below.
I think I am running something wrong in matched.token line, but can't figure out.
Thanks in advance