In the sample data, I've listed the names of employers of a particular person(a prospective customer) which we received from 2 different sources. I've been trying to find a way to better match the two names and get good results. (Currently, it's being done as a manual job) I don't think I'm trying to do the impossible...but if it's not achievable, please don't be harsh!
The below is the dataset which is a "match" as per manual verification.
ADDUS==============================================Addus Home Care
Amazon.com, Inc. and its affiliates=====================Amazon.com
Aon========================================Aon Service Corporation
ARAMARK Food & Support Svc.================================Aramark
AT&T Mobility Services LLC===========================AT&T Mobility
CDW, LLC===========================================CDW Corporation
Lurie Children's Hospital of Chicago======Lurie Childrens Hospital
Securitas Security Services USA, Inc============Securitas security
The PNC Financial Services Group, Inc.======================PNC NA
United States Department of Homeland Security====US Homeland Securiti
TCS=========================================Tata Consultancy Services
Although almost obvious, let me state them for the sake of emphasis.
- There might be spelling mistakes in names from either of these sources
- There might be abbreviations(Ex: TCS in one place and Tata Consultancy in another)
Please suggest me an algorithm or a way to do this with least number of "wrong acceptance cases" - by which I meant cases like this, which have gotten high match ratios from different algorithms.
Please try to suggest a way of doing this.