How do I read in subset of large dataset that satisfies a certain condition?

Asked Oct 06 '19 at 09:49

Active Oct 06 '19 at 09:49

Viewed 166 times

I have a large CSV dataset (>2GB) but I would only like to read in a small subset where the first column = 'a' for example into R. How do I do this quickly, hopefully without having to read the whole dataset in? I don't mind subsetting with a large text editor/command line first if that is what's necessary. As long it's quick.

asked Oct 06 '19 at 09:49

Vykta Wakandigara

You can with `read.csv.sql` from the `sqldf` package, see here: https://stackoverflow.com/questions/35847044/specific-rows-in-fread, but I don't think you get performance bonuses over just using `fread` from the `data.table` package and then subsetting, see here: https://stackoverflow.com/questions/23502974/using-fread-to-select-rows-and-columns-the-way-read-csv-sql-does. – caldwellst Oct 06 '19 at 10:06
I think the fread option appears to be better and faster. Thanks! – Vykta Wakandigara Oct 06 '19 at 11:02

How do I read in subset of large dataset that satisfies a certain condition?

0 Answers0