filtering specific rows in a csv file in python

Question

I have a very large csv file with roughly 30,000 lines and 25 columns that gets produced daily. I need to filter this file to contain only rows that are of interest for me. It is of the form:

date, time, user, entity, party1, party2
20131001, 00:01, user1, ABC, XXX, XXX
20131002, 00:01, user2, XYZ/ABC, XXX, ABC
20131003, 00:01, user1, DEF, ABC, XXX

For example I need to delete all rows that have entity=ABC. I was thinking of either

read the file in and delete each line that contains ABC, but that would get rid of lines that I actually need. I only want to delete lines that contain ABC in the entity column to be removed.
use the csv module in python and try to achieve the same. I've read the functions available in csv, but it doesn't seem to provide anything that let's me delimit by field per column.

I am not necessarily looking for an answer in code, but any general advice on how to solve this problem would be welcome.

Thanks a lot.

Dont you think this will fall under `primarily opinion-based` category? — thefourtheye, Oct 30 '13 at 06:11
Sorry, found the answer here: http://stackoverflow.com/questions/10530301/how-to-filter-from-csv-file-using-python-script — Eric, Oct 30 '13 at 06:19

score 1 · Answer 1 · edited May 23 '17 at 10:31

You can certainly do what you want with Python's csv module, as you suggest and as e.g. @DhruvPathak outlines in his answer (better still here), but I think it's much simpler to do it with a one-line awk script:

$ awk -F ', ' '{ if ($4 != "ABC") print; }' < file.txt
date, time, user, entity, party1, party2
20131002, 00:01, user2, XYZ/ABC, XXX, ABC
20131003, 00:01, user1, DEF, ABC, XXX

where file.txt contains your data.

score 0 · Answer 2 · answered Oct 30 '13 at 06:12

0

for mycsv_line in csv_reader:
    if mycsv_line[4] != "ABC" :
    #append to result

answered Oct 30 '13 at 06:12

DhruvPathak

42,059
16
116
175

filtering specific rows in a csv file in python

2 Answers2