Converting values of named tuples from strings to integers

Question

I'm creating a script to read a csv file into a set of named tuples from their column headers. I will then use these namedtuples to pull out rows of data which meet certain criteria.

I've worked out the input (shown below), but am having issues with filtering the data before outputting it to another file.

import csv
from collections import namedtuple

with open('test_data.csv') as f:
    f_csv = csv.reader(f) #read using csv.reader()
    Base = namedtuple('Base', next(f_csv)) #create namedtuple keys from header row
    for r in f_csv: #for each row in the file
        row = Base(*r) 
        # Process row
        print(row) #print data

The contents of my input file are as follows:

Locus           Total_Depth     Average_Depth_sample    Depth_for_17
chr1:6484996    1030            1030                    1030
chr1:6484997    14              14                      14
chr1:6484998    0               0                       0

And they are printed from my code as follow:

Base(Locus='chr1:6484996', Total_Depth='1030', Average_Depth_sample='1030', Depth_for_17='1030') Base(Locus='chr1:6484997', Total_Depth='14', Average_Depth_sample='14', Depth_for_17='14') Base(Locus='chr1:6484998', Total_Depth='0', Average_Depth_sample='0', Depth_for_17='0')

I want to be able to pull out only the records with a Total_Depth greater than 15.

Intuitively I tried the following function:

if Base.Total_Depth >= 15 :
    print row

However this only prints the final row of data (from the above output table). I think the problem is twofold. As far as I can tell I'm not storing my named tuples anywhere for them to be referenced later. And secondly the numbers are being read in string format rather than as integers.

Firstly can someone correct me if I need to store my namedtuples somewhere.

And secondly how do I convert the string values to integers? Or is this not possible because namedtuples are immutable.

Thanks!

I previously asked a similar question with respect to dictionaries, but now would like to use namedtuples instead. :)

Martijn Pieters · Accepted Answer · 2013-07-19T14:39:00.957

4

Map your values to int when creating the named tuple instances:

row = Base(r[0], *map(int, r[1:]))

This keeps the r[0] value as a string, and maps the remaining values to int().

This does require knowledge of the CSV columns as which ones can be converted to integer is hardcoded here.

Demo:

>>> from collections import namedtuple
>>> Base = namedtuple('Base', ['Locus', 'Total_Depth', 'Average_Depth_sample', 'Depth_for_17'])
>>> r = ['chr1:6484996', '1030', '1030', '1030']
>>> Base(r[0], *map(int, r[1:]))
Base(Locus='chr1:6484996', Total_Depth=1030, Average_Depth_sample=1030, Depth_for_17=1030)

Note that you should test against the rows, not the Base class:

if row.Total_Depth >= 15:

within the loop, or in a new loop of collected rows.

edited Jul 19 '13 at 14:39

answered Jul 19 '13 at 14:29

Martijn Pieters

1,048,767
296
4,058
3,343

Thanks. I can see this outputs the latter three namedtuples (equivalent to my spreadsheet columns) as integers. However, when I try to use my if statement to filter them it still only pulls out the one with Total_Depth=0. Is this because my if statement is outside the first function? – s_boardman Jul 19 '13 at 14:37
2

@s_boardman: Updated; `Base.Total_Depth` is a property object, not a integer; you were probably looking for `row.Total_Depth` instead. – Martijn Pieters Jul 19 '13 at 14:39
That's great, thanks very much! Now to output them to a new CSV file using the namedtuple keys as column headers. :) – s_boardman Jul 19 '13 at 14:45

Converting values of named tuples from strings to integers

1 Answers1

Linked