0

I've seen lots of Q&A on this topic, but none contain the type of output I'm looking for. Any words of wisdom on this would be very much appreciated!

  • I have 2 lists... both lists contain 1 column, consisting of Full Name|University (i.e., name and university, concatenated, and separated by a pipe)
  • There's not always an exact match, due to nicknames and university abbreviations. I want to compare each record in list 1 with each record in list 2, and find the closest match.
  • I then want to produce an output file with 3 columns: Every item from list 1, The closest match from list 2, and the match %.

Does anyone have sample code they could share? Thanks!

Andrew G.
  • 3
  • 1
  • 1
    You probably won't get much help without showing some code you've tried, but the [fuzzywuzzy](https://github.com/seatgeek/fuzzywuzzy) is a nice library for this. – Jack Jan 05 '17 at 03:02
  • Try to explain the problem with Python types i.e _columns_ and _records_ are not Python data types and sound domain specific. As Jack recommended some code examples are always good. – shusson Jan 05 '17 at 03:10
  • I'm super new to Python -- any code I tried for this has bombed, so I thought I'd ask here. Pardon my noob naming conventions. – Andrew G. Jan 05 '17 at 03:16

1 Answers1

0

To get you started, here is an answer which can provide matches on either the full name or the university - you could extend it to include fuzzy search using a library like fuzzywuzzy:

  1. For both lists, split each string into a [full name, university] list (if some of the strings don't contain the '|' character, you might need to wrap this in a try, except or an if statement):

    new_list = [item.split('|') for item in old_list]

  2. Run the following command to match on either element (assuming that one list is called list1 and the other list is called list2):

    matches = [val for val in list1 for item in list2 if val[0] == item[0] or val[1] == item[1]]

David Whitlock
  • 312
  • 1
  • 4