I have a large 30GB file that I want to process.
I am trying to read it line-by-line in chunks since it cannot be loaded into memory.
base::readLines
and readr::read_lines_chunked
are only able to read in chunks starting from the first line and finishing at the last line.
What I would like to do instead is specify something like this:
read lines 1:100
read lines 101:200
read lines 201:300
read lines 301:400
...
until the end of the file
I could do this in a loop if I could specify the exact lines to read in, but I think neither of the above mentioned functions allow for this.
is there a way to do this?
the skip
argument in readr:read_lines_chunked
allows for skipping the first n
lines in the datafile, but what I would need is to skip the first n
and last m
lines.
For example if the file has 1000
lines:
skip the first 100
and the last 800
would read in 101-200