How do read file by filtering rows based on a condition in R

Question

I am using R to reach csv. But i do not want whole dataset in memory as dataset is too large. But I need to read rows based on one column's category.

I want to read only rows where col2 = 'A'

Example : col1 col2 col 3
1 A 1000
2 B 2000
3 A 1000
4 A 2000
5 A 1000
6 B 2000

score 7 · Answer 1 · answered May 24 '20 at 19:03

You could try to use fread from data.table package with cmd option. From documentation:

A shell command that pre-processes the file; e.g. fread(cmd=paste("grep",word,"filename"). See Details.

Shell commands:

fread accepts shell commands for convenience. The input command is run and its output written to a file in tmpdir (link{tempdir}() by default) to which fread is applied "as normal". The details are platform dependent -- system is used on UNIX environments, shell otherwise; see system.

So if you run something like

library(data.table)
t <- fread(......., cmd=paste("grep","' A '","filename"), .....)

then it filters lines which contains A (A surrounded by spaces) and then apply fread to the result.

I think if the data is really big, perhaps `vroom` would be useful — akrun, May 24 '20 at 19:07
@akrun First time I heard about `vroom` package, looks interesting. Thank you — Severin Pappadeux, May 24 '20 at 19:18

score 2 · Answer 2 · answered May 24 '20 at 18:40

2

We could use sqldf

library(sqldf)
df1 <- read.csv.sql("file.csv", "select *, from file where col2 = 'A'", sep=",")

answered May 24 '20 at 18:40

akrun

874,273
37
540
662

Is it (SQL filter) happens to run after reading whole file? OP said he doesn't want whole file in memory. – Severin Pappadeux May 24 '20 at 18:58
@SeverinPappadeux it imports data into a temporary SQLite database and then reads it into R. – akrun May 24 '20 at 19:00
Ah, I see. I proposed to use fread with shell and grep, where filter happens to run before reading whole file – Severin Pappadeux May 24 '20 at 19:05

score 0 · Answer 3 · edited Sep 24 '21 at 07:53

0

One of these should solve the issue:

fread(file=file_name, select=col_names)[specific_col_name %in% ID_name]

or

fread(file=file_name, select=col_names)[grep(pattern, specific_col_name, ignore.case = TRUE)]

edited Sep 24 '21 at 07:53

Suraj Rao

29,388
11
94
103

answered Sep 24 '21 at 07:41

KHOKHAR

1
1

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Sep 24 '21 at 08:00

How do read file by filtering rows based on a condition in R

3 Answers3