I have created two CSV lists. One is an original CSV file, the other is a DeDuped version of that file. I have read each into a list and for all intents and purposes they are the same format. Each list item is a string.
I am trying to use a list comprehension to find out which items were deleted by the duplication. The length of the original is 16939 and the list of the DeDupe is 15368. That's a difference of 1571, but my list comprehension length is 368. Ideas?
deduped = open('account_de_ex.csv', 'r')
deduped_data = deduped.read()
deduped.close()
deduped = deduped_data.split("\r")
#read in file with just the account names from the full account list
account_names = open('account_names.csv', 'r')
account_data = account_names.read()
account_names.close()
account_names = account_data.split("\r")
# Get all the accounts that were deleted in the dedupe - i.e. get the duplicate accounts
dupes = [ele for ele in account_names if ele not in deduped]
Edit: For some notes in the comments, here is a test on my list comp and the lists themselves. Pretty much the same difference, 20 or so off. Not the 1500 i need! thanks!
print len(deduped)
deduped = set(deduped)
print len(deduped)
print len(account_names)
account_names = set(account_names)
print len(account_names)
15368
15368
16939
15387