0

I have two lists containing customer names. The names can be similar or different. How to find the similarity between these two lists using python?

After having similarity I want to pull corresponding data from one excel file to other.

example:

List 1:

Customer Name       Unique ID
IBM                 2365
BOA                 5456
BMW AG              2456

List 2:

Customer Name     Unique ID
IBM Pvt Ltd        
BMW Group
Robert Bosch
BOA Ltd

This is just a sample data. Actual data contains almost 300k lines.

I tried Jaccard Similarity by passing the two lists separately as an excel files to the function, but the result (i.e. Jaccard Similarity) is always zero.

Edit: How to iterate through both the lists, compare each element with all the elements of other list and build a distance matrix?

Then, I would like to sort each row of that matrix in descending order to know the closest match between them. Or is there any other better method to know the closest match after the matrix is built?

1 Answers1

0

Could you elaborate and make your question a little clear ?

What doe you mean by Similarity beetwen 2 list ?

When you say List, you mean CSV/Excel List or Python list . If you are looking at distance beetwen the string you might have to look at Levenshtein Algorithm . https://www.geeksforgeeks.org/edit-distance-dp-5/

Pythonic - https://www.python-course.eu/levenshtein_distance.php .

Since your data size if humongous , Alsp Check external merge sort strategy

melvil james
  • 592
  • 7
  • 18
  • Hi @user2623720 - lists are two separate excel files containing customer names. I need to match the two lists to check for any similar names. If the similarity is high, I need to pull the corresponding data from List1 and put it in List 2. Hope this makes it more clearer. Thank you so much for your answer :) – Akshay Gupta Nov 26 '18 at 10:55
  • What is the metrics to measure for similarity or in other words , what do you mean similar items ? Is it just a substring or the nearest word to that ? – melvil james Nov 26 '18 at 14:16
  • For example: MERCEDES-AMG GMBH and MERCEDES-BENZ ENERGY GMBH are two different entries in the list but may refer to same company. So, by looking at the two names, we can say that they are highly similar. Also, I can check the country code just to verify the companies are located in the same region. – Akshay Gupta Nov 27 '18 at 04:04