i have a dataframe with company names
df:
company_name |
---|
abc Inc |
abc Inc Bolingbrook |
enterprise badh Shah |
enterprise Financial |
enterprise Financial Shah |
bass Dance |
bass School of Dance |
david Warner |
david Warner Real Estate Inc |
david Warneranita sampath |
Dr anitha sampath |
Dranil kumar Gyan prasad |
Dranil and kumar Mortgage Corporation |
Drbadh Shah |
Drvenky Patel |
Drs krishna and Rama lingam |
i want to standardize the company_name so that the output looks like this
output df:
company_name | standardized_company_name |
---|---|
abc Inc | abc Inc |
abc Inc Bolingbrook | abc Inc |
enterprise badh Shah | enterprise Financial |
enterprise Financial | enterprise Financial |
enterprise Financial Shah | enterprise Financial |
bass Dance | bass School of Dance |
bass School of Dance | bass School of Dance |
david Warner | david Warner |
david Warner Real Estate Inc | david Warner |
david Warneranita sampath | david Warner |
Dr anitha sampath | anitha sampath |
Dranil kumar Gyan prasad | anil kumar |
Dranil and Gyan Mortgage Corporation | anil kumar |
Drbadh Shah | badh Shah |
Drvenky Patel | venky Patel |
Drs krishna and Rama lingam | krishna and Rama lingam |
NOTE: the standardization has no rules but similar company_names should have same standardized_company_name
for eg: standardized_company_name can also be like this
company_name | standardized_company_name |
---|---|
abc Inc | abc |
abc Inc Bolingbrook | abc |
enterprise badh Shah | enterprise |
enterprise Financial | enterprise |
enterprise Financial Shah | enterprise |
i tried removing stopwords using regex replace but its not effective. Thanks in advance.............
i also tried splitting
def func(val):
val=val.split(' ',2)
return ' '.join([val[0]])
name = unique[['company_name','state']]
name['standardized_company_name']=name['company_name'].apply(func)
but what i get is
output i got :
company_name | standardized_company_name |
---|---|
abc Inc | abc |
abc Inc Bolingbrook | abc |
enterprise badh Shah | enterprise |
enterprise Financial | enterprise |
enterprise Financial Shah | enterprise |
bass Dance | bass |
bass School of Dance | bass |
david Warner | david |
david Warner Real Estate Inc | david |
david Warneranita sampath | david |
Dr anitha sampath | Dr |
Dranil kumar Gyan prasad | Dranil |
Dranil and kumar Mortgage Corporation | Dranil |
Drbadh Shah | Drbadh |
Drvenky Patel | Drvenky |
Drs krishna and Rama lingam | Drs |