Feature Selection and Machine Learning for Merchant Names

Question

I would like to classify/categorize/cluster/group together millions of rows of merchant names to their standardized merchant name. For example, 1. Walmart 2. Walmart NY 3. Walmart #12 AHN 4. Wal3mart 5. Sam's Club

all belong to the standard name of "WALMART". I have several millions of rows of merchant names, and standard names (close to 60k) and every month new merchant names come in. The merchant names can be spelling errors, subsidiaries of a bigger merchant, merger and acquisition, short cut etc.

Is there a way, we can train a machine learning algorithm to classify these business names.

My preliminary idea is to represent all the merchant names belonging to one standardized name as a group of vectors and then use Support Vector Machines to draw a hyper-plane between all the different standardized merchant names and when a new merchant name comes, represent it as a vector and see which standardized merchant name group, this new merchant name is closest to using a similarity score (say cosine distance).

However, I would like to know if there is any other way of representing these merchant names as features on any other algorithm that i can use for this problem statement. Any brain=storming would be much appreciated. Thanks in advance

Walmart and variants are possible but there is direct way to relate Sam's Club with Walmart. SVM might be overkill if you don't have very complicated scenarios; you can start with simple `edit distance` type of concept. — abhiieor, Feb 26 '17 at 12:44
Thank you abhiieor for the insights. You have mentioned there is a direct way to relate Sam's Club with Walmart. Can you throw some light into this. I have implemented the edit distance type of concept (Soundex to cluster and then levenstein among the clusters to filter our dissimilar strings) I am looking for a machine learning approach, if at all there is any suitable for this kind of problem! — msksantosh, Feb 28 '17 at 04:36
oops silly thing..I meant "Walmart and variants are possible but there is NON direct way to relate Sam's Club with Walmart." this gives very good idea about what to do https://www.linkedin.com/pulse/duplicate-quora-question-abhishek-thakur?deepLinkCommentId=6241980560160493568&anchorTime=1488204135942&trk=hb_ntf_MEGAPHONE_REPLY_TOP_LEVEL_COMMENT beware this is way too deep. At start you may just want to take some pieces from here. — abhiieor, Feb 28 '17 at 11:31

Feature Selection and Machine Learning for Merchant Names

0 Answers0