0

I'm fairly new to R and am working with multiple GB files with up to 100MM rows. Instead of reading in the entire data file, I'd prefer to read in one column, subset it based on certain criterion, and use the resultant indices to read in data from another column for analysis.

I.e., I subset my column of data (let's call it ID) by which match the values in another array, as below.

rowind = which(ID[,1] %in% Zid);

Then, when I read in another column for processing, I'd like to only read in those rows that match the numbers in rowind.

I have done a lot of searching for this - I know how to skip specific columns and skip a certain number of rows at the beginning of the data frame, but I don't know how to read in say, rows [3,5,8,11,15], etc.

Thomas
  • 43,637
  • 12
  • 109
  • 140
Dan_Alexander
  • 369
  • 2
  • 4
  • 7
  • 6
    R is not suited for this kind of work. With many of the text import functions, you can choose a start and an end, but not multiple disjoint sets of rows. While there are solutions only in R, it would be faster to use a language that quickly parses through files by line, such as Python, awk, sed, or Perl. After you extract the necessary lines, then import the file with R. – Blue Magister Feb 28 '14 at 14:55
  • possible duplicate of [In R, how can read lines by number from a large file?](http://stackoverflow.com/questions/7156770/in-r-how-can-read-lines-by-number-from-a-large-file) – Matthew Plourde Feb 28 '14 at 20:09

0 Answers0