So I have a CSV file full of transactions, with the vendor name in one column and the transaction amount in another. The goal is to find the top vendors in terms of total number of transactions. That part is pretty simple, and I have code like so:
with open('Transactions.csv') as Vendor_Data:
file_reader = csv.reader(Vendor_Data, delimiter=',')
vendor_dict = {}
next(file_reader)
for row in file_reader:
if row[3] not in vendor_dict:
vendor_dict[row[3]] = [0, 0]
vendor_dict[row[3]][1] += round(float(row[1]), 2)
else:
vendor_dict[row[3]][0] += 1
vendor_dict[row[3]][1] += round(float(row[1]), 2)
The problem is, there are many entries where the same vendor is spelled slightly different ("Delta Airlines" v. "Delta Air"). What is the best way to detect these similar string names (say, with Fuzzywuzzy) while looping through the CSV file, and consolidate the transaction instance and amounts?