18

I would like to know what is the recommended way of reading a data.table from an archived file (zip archive in my case). One obvious option is to unzip it to a temporary file and then fread() it as usual. I don't want to bother about creating new file, so instead I use read.table() from unz() connection and then convert it with data.table():

mydt <- data.table(read.table(unz(myzipfilename, myfilename)))

This works fine but read.table() is slow for big files while fread() can't read unz() connection directly. I'm wondering if there is any better solution.

Vasily A
  • 8,256
  • 10
  • 42
  • 76
  • You might take a look at the `read_file` function from the `readr` package (one of Hadley Wickham's). I've found it to be faster than base R reads for unzipped files and the documentation indicates that it can read zipped files. – WaltS Oct 26 '15 at 11:40

1 Answers1

27

Look at: Read Ziped CSV File with fread To avoid tmp files you can use unzip with -p extract files to pipe, no messages

You can use such a kind of statements with fread.

x = fread('unzip -p test/allRequests.csv.zip')

Or with gunzip

x = fread('gunzip -cq test/allRequests.csv.gz')

You can also use grep or other tools.

Mirko Ebert
  • 1,349
  • 1
  • 18
  • 36
  • 2
    It's worth adding when preprocessing command is used `data.table` developers recommends using `cmd` arg, e.g. `fread(cmd = 'unzip -p test/allRequests.csv.zip')` due to security reasons. – Taz Aug 21 '20 at 14:17
  • I had to copy my `unzip.exe` file to my working R directory for this to work. – kakarot Oct 08 '20 at 14:06
  • unzip or gunzip have to be in $PATH. – Mirko Ebert Oct 09 '20 at 13:01