I have four columns of data that I am trying to consolidate based on two conditions. The data are formatted as follows:
CountyName Year Oil Gas
ANDERSON 2010 1358 0
ANDERSON 2010 621746 4996766
ANDERSON 2011 1587 0
ANDERSON 2011 633120 5020877
ANDERSON 2012 55992 387685
ANDERSON 2012 1342 0
ANDERSON 2013 635572 3036578
ANDERSON 2013 4873 0
ANDERSON 2014 656440 2690333
ANDERSON 2014 12332 0
ANDERSON 2015 608454 2836272
ANDERSON 2015 23339 0
ANDERSON 2016 551728 2682261
ANDERSON 2016 12716 0
ANDERSON 2017 132466 567874
ANDERSON 2017 1709 0
ANDREWS 2010 25701725 1860063
ANDREWS 2010 106351 0
ANDREWS 2011 97772 0
ANDREWS 2011 28818329 1377865
ANDREWS 2012 105062 0
...
I'm interested in combining the respective oil and then gas values for entries that are repeated. For example, I'd like to add all the oil entries for Anderson County for the year 2010 and have that value replace the existing entries in just one row. The code I am using now is summing all the values in the respective county regardless of year, giving me a condensed output like this:
CountyName Year Oil Gas
ANDERSON 3954774
ANDREWS 206472698
...
Here's the code I am using:
import csv
with open('Texas.csv', 'r') as Texas: #opening Texas csv file
TexasReader = csv.reader(Texas)
counties = {}
years = {}
index = 0 and 1
for row in TexasReader:
if index == 0 and 1:
header = row
else:
county = row[0]
year = row[1]
oil = row[2]
gas = row[3]
if county in counties:
counties[county] += int(oil)
else:
counties[county] = int(oil)
index += 1
with open('TexasConsolidated.csv', 'w') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=header, delimiter=',', lineterminator='\n')
writer.writeheader()
for k, v in counties.items():
writer.writerow({header[0]: k, header[2]: v})