0

I have a large 30GB file that I want to process.

I am trying to read it line-by-line in chunks since it cannot be loaded into memory.

base::readLines and readr::read_lines_chunked are only able to read in chunks starting from the first line and finishing at the last line.

What I would like to do instead is specify something like this:

read lines 1:100
read lines 101:200
read lines 201:300
read lines 301:400
...
until the end of the file

I could do this in a loop if I could specify the exact lines to read in, but I think neither of the above mentioned functions allow for this.

is there a way to do this?

the skip argument in readr:read_lines_chunked allows for skipping the first n lines in the datafile, but what I would need is to skip the first n and last m lines.

For example if the file has 1000 lines:

skip the first 100 and the last 800 would read in 101-200

upabove
  • 1,057
  • 3
  • 18
  • 29
  • `readr:read_lines_chunked` has a `chunk_size` argument that limits the number of lines to read. So you can combine `skip` with `chunk_size` – Conor Neilson Jul 15 '19 at 14:33
  • but chunk_size will always read from skip to end, right? how can I tell it to read lines 101:200 for example? if I specify skip=100 it will read from 101-10000000 in chunks but won't stop reading until it reaches the end of the file – upabove Jul 15 '19 at 14:34
  • 1
    I had a similar problem and I used the LaF package. – James B Jul 15 '19 at 14:35
  • thanks @JamesB looks like that will work! – upabove Jul 15 '19 at 14:40

1 Answers1

0

Thanks to @JamesB the solution is:

library("LaF")
get_lines(file, line_numbers=c(100,101))
upabove
  • 1,057
  • 3
  • 18
  • 29