I want to take a list of Customer names and compare them to an internal database to find a high likely match and return a customer code
So I would receive a list of customers like this:
Cx Name |
---|
Chicken C. |
Water Gmbh |
Computer ldt |
Food, Glorious Food |
and I want to compare it to an internal database like this:
Cx Name database | Cx Number |
---|---|
Tech Co. | 9123 |
Computer LTD. | 8123 |
Chicken Co. | 7123 |
Water Gmbh | 6123 |
and return something like this:
Cx Name | Cx Suggestion |
---|---|
Chicken C. | 7123 |
Water Gmbh | 6123 |
Computer ldt | 8123 |
I was thinking of using a loop and stringdist to compare each cx name to the database and return the highest value score if it scores above a 90% match. But I'm not sure how to best approach this and my loop skills are bit rusty in R.
This is obviously a very crude example. Typically I would do a bit of data cleaning before hand and I would be working with about 500 different customers matched against a database of 5000 - 10000 customers names.