I have a huge txt file with over 600 million rows and around 27 GB. I used the fread from data.table on a server with 256GB RAM and 32 processors. It took around 3.5 hours to complete reading 10% of the data. In that case, to only read in this table, it will take around 35 hours on my server. What is the faster way to read such big dataset? 1) split it into multiple small files first, and read in? 2) does multicore work for the fread?
Any suggestions and comments are appreciated!