I am on a 512gb ram server. I have a 84gig CSV (hefty, I know). I am reading only 31 columns of 79, where the excluded are all floats/decimals.
After comparing many methods, it seems the highest performance way to do what I want would be to fread the file. The file size is 84gb, but watching "top" the process uses 160 gigs of memory (RES), even though the size of the eventual data.table is about 20gigs.
I know fread preallocates memory which is why it's so fast. Just wondering - is this normal and is there a way to curb the memory consumption?
Edit: it seems like, if I just ask fread to read 10000 rows (of 300MM), fread will still preallocate 84 gigs of memory.