0

I have a file which have different record type in different row and this can be identified using the first column value of the row, a sample data set is given below

V1  V2    V3   V4
 1  ABC   DEF  10
 1  XYZ   QWE  11
 2  ASD   WER  1
 1  QWE   SDF  34
 2  FGH   AXD  35

Now, i want to read this file but not the entire file but only those rows for which value of V1 is 2 and want to discard the rows with value 1. This can easily be done in SAS and if we have to do in R, we can read the entire file and then subset it. But i don't want to read the entire file and then subset, is it possible to read these specific rows at the time of reading.

Thanks for your answers.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
Pawan
  • 1,066
  • 1
  • 10
  • 16
  • You could probably hack something together in R to do this, but it will probably be simpler to just call `sed` via `system` and then read in a file with just the lines you want. – joran Sep 18 '13 at 16:24
  • May you please elaborate on `sed` via `system` – Pawan Sep 18 '13 at 16:32

2 Answers2

5

I like to use R in combination with a pipe -- with thanks to the connections functionality we can do it one go:

 R> X <- read.table(pipe("awk '($1 != 1) {print $0}' /tmp/pawan.csv"), 
 +                  header=TRUE) 
 R> X    
   V1  V2  V3 V4 
 1  2 ASD WER  1  
 2  2 FGH AXD 35 
 R>   

Here we keep just one condition (ignore line with value V1 equal to 1) but one could easily do much in either awk, sed, perl, python, ...

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
4

I'm literally just learning sed myself, and I'm not the best at regex either, so that's an open invitation to people to not just mention improvements in comments, but to just edit this directly:

system("sed -e '2,${ /^.1/d; }' ~/Desktop/test.txt > ~/Desktop/test1.txt")

That should skip the first line (the header) and then delete all the rows that begin with a space and a 1. (Which is how your example data ended up looking when I copy+pasted it onto my computer.)

joran
  • 169,992
  • 32
  • 429
  • 468