0

I have four columns of data that I am trying to consolidate based on two conditions. The data are formatted as follows:

CountyName  Year    Oil Gas
ANDERSON    2010    1358    0
ANDERSON    2010    621746  4996766
ANDERSON    2011    1587    0
ANDERSON    2011    633120  5020877
ANDERSON    2012    55992   387685
ANDERSON    2012    1342    0
ANDERSON    2013    635572  3036578
ANDERSON    2013    4873    0
ANDERSON    2014    656440  2690333
ANDERSON    2014    12332   0
ANDERSON    2015    608454  2836272
ANDERSON    2015    23339   0
ANDERSON    2016    551728  2682261
ANDERSON    2016    12716   0
ANDERSON    2017    132466  567874
ANDERSON    2017    1709    0
ANDREWS 2010    25701725    1860063
ANDREWS 2010    106351  0
ANDREWS 2011    97772   0
ANDREWS 2011    28818329    1377865
ANDREWS 2012    105062  0
...

I'm interested in combining the respective oil and then gas values for entries that are repeated. For example, I'd like to add all the oil entries for Anderson County for the year 2010 and have that value replace the existing entries in just one row. The code I am using now is summing all the values in the respective county regardless of year, giving me a condensed output like this:

CountyName  Year    Oil Gas
ANDERSON        3954774 
ANDREWS      206472698  
...

Here's the code I am using:

import csv
with open('Texas.csv', 'r') as Texas: #opening Texas csv file
    TexasReader = csv.reader(Texas)
    counties = {}
    years = {}

    index = 0 and 1
    for row in TexasReader:
        if index == 0 and 1:
            header = row
        else:
            county = row[0]
            year = row[1]
            oil = row[2]
            gas = row[3]

            if county in counties: 
                counties[county] += int(oil) 
       else:
               counties[county] = int(oil)
        index += 1

    with open('TexasConsolidated.csv', 'w') as csvfile:
        writer = csv.DictWriter(csvfile, fieldnames=header, delimiter=',', lineterminator='\n')
        writer.writeheader()
        for k, v in counties.items():
            writer.writerow({header[0]: k, header[2]: v})
Toby Speight
  • 27,591
  • 48
  • 66
  • 103
tda0715
  • 11

1 Answers1

0

This is the line that is doing what you complain of:

if county in counties: 
    counties[county] += int(oil) 

If you want a dict that stores sums over two keys then both values need to be in the dict key.

Add the line

counties_years = {}

then sum like this, using the tuple (county,year) as the key:

if (county,year) in counties_years: 
    counties_years[(county,year)] += int(oil) 
else:
    counties_years[(county,year)] = int(oil)
BoarGules
  • 16,440
  • 2
  • 27
  • 44