I have a very large multi-gigabyte file which is too costly to load into memory. The ordering of the rows in the file, however, are not random. Is there a way to read in a random subset of the rows using something like fread?
Something like this, for example?
data <- fread("data_file", nrows_sample = 90000)
This github post suggests one possibility is to do something like this:
fread("shuf -n 5 data_file")
This does not work for me, however. Any ideas?