I have a very large csv file with roughly 30,000 lines and 25 columns that gets produced daily. I need to filter this file to contain only rows that are of interest for me. It is of the form:
date, time, user, entity, party1, party2
20131001, 00:01, user1, ABC, XXX, XXX
20131002, 00:01, user2, XYZ/ABC, XXX, ABC
20131003, 00:01, user1, DEF, ABC, XXX
For example I need to delete all rows that have entity=ABC. I was thinking of either
read the file in and delete each line that contains ABC, but that would get rid of lines that I actually need. I only want to delete lines that contain ABC in the entity column to be removed.
use the csv module in python and try to achieve the same. I've read the functions available in csv, but it doesn't seem to provide anything that let's me delimit by field per column.
I am not necessarily looking for an answer in code, but any general advice on how to solve this problem would be welcome.
Thanks a lot.