I am new to the Python family and have been trying to solve merge two Excel files for days. I have researched merging endlessly and tried to adapt my code to fit my needs, but it hasn't been working. I was wondering if I could get any help of why my code isn't working. I feel that this could be a common problem to others using Python, so hopefully this will help out others as well. I appreciate any comments!
I have two excel files, 'Chinese Scores3.csv' and 'Chinese Scores4.csv' which I am trying to merge by an ID, which is unique to each company. Other than the company ID, there are no matching columns for each excel file. Also, not all companies are listed on both files. Some are listed both, but others are listed on either one or the other. I would like to attach all the information for a company ID together in one row on an excel sheet. i.e. the first excel file columns are ID, JanSales, FebSales, etc. and the second excel file columns are ID, CreditScore, EMMAScore, etc. The excel file I would like to create has columns: ID, JanSales, FebSales, CreditScore, EMMAScore all according to company ID.
Is this making sense? It's like using VLOOKUP in excel, but I would like to do this using Python. Anyway, here is my coding, which isn't working. I try manipulating it, but it isn't working. I hope to get feedback!
import sys
import csv
def main(arg):
headers= []
for arg in 'Chinese Scores3.csv':
with open(arg) as f:
curr = 'Chinese Scores3.csv'.reader(f).next()
headers.append(curr)
try:
keys=list( set(keys) & set (curr))
except NameError:
keys = curr
header = list(keys)
for h in headers:
header += [ k for k in h if k not in keys ]
data = {}
for arg in 'Chinese Scores4.csv':
with open(arg) as f:
reader = 'Chinese Scores4.csv'.DictReader(f)
for line in reader:
data_key = tuple([ line[k] for k in keys ])
if not data_key in data: data[data_key] = {}
for k in header:
try:
data[data_key][k] = line[k]
except KeyError:
pass
for key in data.keys():
for col in header:
if key in data and not col in data[key]:
del( data[key] )
print ','.join(header)
for key in sorted(data):
row = [ data[key][col] for col in header ]
print ','.join(row)
if __name__ == '__main__':
sys.exit( main( sys.argv[1:]) )