I'm creating a script to read a csv file into a set of named tuples from their column headers. I will then use these namedtuples to pull out rows of data which meet certain criteria.
I've worked out the input (shown below), but am having issues with filtering the data before outputting it to another file.
import csv
from collections import namedtuple
with open('test_data.csv') as f:
f_csv = csv.reader(f) #read using csv.reader()
Base = namedtuple('Base', next(f_csv)) #create namedtuple keys from header row
for r in f_csv: #for each row in the file
row = Base(*r)
# Process row
print(row) #print data
The contents of my input file are as follows:
Locus Total_Depth Average_Depth_sample Depth_for_17
chr1:6484996 1030 1030 1030
chr1:6484997 14 14 14
chr1:6484998 0 0 0
And they are printed from my code as follow:
Base(Locus='chr1:6484996', Total_Depth='1030', Average_Depth_sample='1030', Depth_for_17='1030') Base(Locus='chr1:6484997', Total_Depth='14', Average_Depth_sample='14', Depth_for_17='14') Base(Locus='chr1:6484998', Total_Depth='0', Average_Depth_sample='0', Depth_for_17='0')
I want to be able to pull out only the records with a Total_Depth greater than 15.
Intuitively I tried the following function:
if Base.Total_Depth >= 15 :
print row
However this only prints the final row of data (from the above output table). I think the problem is twofold. As far as I can tell I'm not storing my named tuples anywhere for them to be referenced later. And secondly the numbers are being read in string format rather than as integers.
Firstly can someone correct me if I need to store my namedtuples somewhere.
And secondly how do I convert the string values to integers? Or is this not possible because namedtuples are immutable.
Thanks!
I previously asked a similar question with respect to dictionaries, but now would like to use namedtuples instead. :)