Is there a way to filter data before or while read into a data frame?
For example, I have the following csv datafile:
time Event price Volume
00:00:00.000, B, 920.5, 57
00:00:00.000, A, 920.75, 128
00:00:00.898, T, 920.75, 1
00:00:00.898, T, 920.75, 19
00:00:00.906, B, 920.5, 60
00:00:41.284, T, 920.75, 5
00:00:57.589, B, 920.5, 53
00:01:06.745, T, 920.75, 3
00:01:06.762, T, 920.75, 2
I would like to read rows of data where 'Event'=='T'
and 'Volume'>=100
only.
It is very easy to accomplish if we read the entire dataset in and then filter out the data (and that is what I am doing right now).
Each of the file I have is 10MB and there are thousands of them (about 15 GB data in total), this procedure will take forever. So I am wondering if there is a way to filter the data while reading in, or some other methods to speed things up a little. Maybe use a database instead?