4

I am trying to read a zip file that has 1 csv file in it.

It works great when I know the csv file name but when I just try to extract the zip file alone, it doesn't work.

Here is an example of where it does works:

zip_file <- abc.zip
csv_file <- abcde.csv

data <- read.table(unz(zip_file,csv_file), skip = 10, header=T, quote="\"", sep=",")

Here is where it doesn't work when I try to only extract the zip file:

read.table(zip_file, skip = 10, nrows=10, header=T, quote="\"", sep=",")

An error comes up saying:

Error in read.table(attachment_file, skip = 10, nrows = 10, header = T,  : 
  no lines available in input
In addition: Warning messages:
1: In readLines(file, skip) : line 2 appears to contain an embedded nul
2: In readLines(file, skip) : line 3 appears to contain an embedded nul
3: In readLines(file, skip) :
  incomplete final line found on 
'C:\Users\nickk\AppData\Local\Temp\RtmpIrqdl8\file2c9860d62381'

So this shows there is definitely a csv file present because it works when I include the csv file name but when I just do the zip file, then the error comes up.

For context, the reason why I do not want to include the csv file name is because I need to read this zip file daily and the name of the csv file changes with no pattern everytime. So my goal is to only read the zip file to bypass this.

Thanks!

nak5120
  • 4,089
  • 4
  • 35
  • 94
  • You don't need to specifically unzip file for [`read.table`](http://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html) (see `file` description). Try: `data <- read.table(zip_file, skip = 10, header=T, quote="\"", sep=",")` – m0nhawk Dec 18 '17 at 14:04
  • 2
    Maybe fetch the filename with `unzip(zip_file,list=TRUE)`, then use that filename as your variable csv_file. – Florian Dec 18 '17 at 14:05
  • @Florian that worked great! – nak5120 Dec 18 '17 at 14:07

2 Answers2

3

Why don't you try using unzip to find the filename inside the ZIP archive:

zipdf <- unzip(zip_file, list = TRUE)
# the following line assuming the archive has only a single file
csv_file <- zipdf$Name[0]

your_df <- read.table(csv_file, skip = 10, nrows=10, header=T, quote="\"", sep=",")
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
0

If you are open to data.table, you can try:

data.table::fread(paste('unzip -cq', zip_file), skip = 10)
  • -c: uncompress to standout;
  • -q: suppress messages printed by unzip;
mt1022
  • 16,834
  • 5
  • 48
  • 71