3

I am using R to reach csv. But i do not want whole dataset in memory as dataset is too large. But I need to read rows based on one column's category.

I want to read only rows where col2 = 'A'

Example : col1 col2 col 3
1 A 1000
2 B 2000
3 A 1000
4 A 2000
5 A 1000
6 B 2000

Anuj Gupta
  • 37
  • 1
  • 4

3 Answers3

7

You could try to use fread from data.table package with cmd option. From documentation:

A shell command that pre-processes the file; e.g. fread(cmd=paste("grep",word,"filename"). See Details.

Shell commands:

fread accepts shell commands for convenience. The input command is run and its output written to a file in tmpdir (link{tempdir}() by default) to which fread is applied "as normal". The details are platform dependent -- system is used on UNIX environments, shell otherwise; see system.

So if you run something like

library(data.table)
t <- fread(......., cmd=paste("grep","' A '","filename"), .....)

then it filters lines which contains A (A surrounded by spaces) and then apply fread to the result.

Severin Pappadeux
  • 18,636
  • 3
  • 38
  • 64
2

We could use sqldf

library(sqldf)
df1 <- read.csv.sql("file.csv", "select *, from file where col2 = 'A'", sep=",")
akrun
  • 874,273
  • 37
  • 540
  • 662
0

One of these should solve the issue:

fread(file=file_name, select=col_names)[specific_col_name %in% ID_name] 

or

fread(file=file_name, select=col_names)[grep(pattern, specific_col_name, ignore.case = TRUE)] 
Suraj Rao
  • 29,388
  • 11
  • 94
  • 103
KHOKHAR
  • 1
  • 1
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Sep 24 '21 at 08:00