7

I am trying to parse some online weather data with R. The data is a binary file that has been gzipped. An example file is:

ftp://ftp.cpc.ncep.noaa.gov/precip/CPC_UNI_PRCP/GAUGE_GLB/V1.0/2005/PRCP_CU_GAUGE_V1.0GLB_0.50deg.lnx.20050101.gz

If I download the file to my computer and manually unzip it, I can easily do the following:

  myFile <- ( "/tmp/PRCP_CU_GAUGE_V1.0GLB_0.50deg.lnx.20050101" )
  to.read = file( myFile, "rb")
  myPoints <- readBin(to.read, real(), n=1e6, size = 4, endian = "little")

What I would prefer to do is automate both the download/unzip along with the read. So I thought that would be as simple as the following:

p <- gzcon( url( "ftp://ftp.cpc.ncep.noaa.gov/precip/CPC_UNI_PRCP/GAUGE_GLB/V1.0/2005/PRCP_CU_GAUGE_V1.0GLB_0.50deg.lnx.20050101.gz" ) )
myPoints <- readBin(p, real(), n=1e6, size = 4, endian = "little")

This seems to work just dandy, but in the manual step the vector myPoints has length 518400, which is accurate. However if R handles the download and read as in the second example, I get a different length vector every time I run the code. Seriously. I'm not smoking anything. I swear. I run it multiple time and each time the vector is a different length, always less than the expected 518400.

I also tried getting R to download the gzip file using the following:

temp <- tempfile()
myFile <- download.file("ftp://ftp.cpc.ncep.noaa.gov/precip/CPC_UNI_PRCP/GAUGE_GLB/V1.0/2005/PRCP_CU_GAUGE_V1.0GLB_0.50deg.lnx.20050101.gz",temp)

I found that often that would return a Warning about the file not being the expected size. Like the following:

Warning message:
In download.file("ftp://ftp.cpc.ncep.noaa.gov/precip/CPC_UNI_PRCP/GAUGE_GLB/V1.0/2005/PRCP_CU_GAUGE_V1.0GLB_0.50deg.lnx.20050101.gz",  :
  downloaded length 162176 != reported length 179058

Any tips you can throw my way that might help me solve this?

-J

JD Long
  • 59,675
  • 58
  • 202
  • 294
  • Just a comment for the record, but the automated version using `gzcon()` works fine for me following repeated runs; all gave the correct length. – Gavin Simpson May 27 '11 at 15:54
  • Further testing showed that this works for me if I'm using R from the command prompt, but not from inside of RStuido. I think I found an RStudio bug. – JD Long May 27 '11 at 15:58
  • I've reported this as a possible RStudio bug: http://support.rstudio.org/help/discussions/problems/600-conflict-with-rstudio-and-reading-connections – JD Long May 27 '11 at 16:03
  • @JD Long I've just tested your code using RStudio 0.94.48 and it worked OK – Luciano Selzer May 27 '11 at 18:23
  • @lselzer it appears to be specific to RStudio Server running on Ubuntu. – JD Long May 27 '11 at 20:41
  • I see this a lot, reports on "R" that turn out to be some interface to it. Oughta be a law . . . – mdsumner May 27 '11 at 22:43
  • @mdsummer, I think there is a law: Murphy's Law :) – JD Long May 28 '11 at 00:50

1 Answers1

2

Try this:

R> remfname <- "ftp://ftp.cpc.ncep.noaa.gov/precip/CPC_UNI_PRCP/GAUGE_GLB/V1.0/2005/PRCP_CU_GAUGE_V1.0GLB_0.50deg.lnx.20050101.gz"
R> locfname <- "/tmp/data.gz"
R> download.file(remfname, locfname)
trying URL 'ftp://ftp.cpc.ncep.noaa.gov/precip/CPC_UNI_PRCP/GAUGE_GLB/V1.0/2005/PRCP_CU_GAUGE_V1.0GLB_0.50deg.lnx.20050101.gz'
ftp data connection made, file length 179058 bytes
opened URL
==================================================
downloaded 174 Kb

R> con <- gzcon(file(locfname, "rb"))
R> myPoints <- readBin(con, real(), n=1e6, size = 4, endian = "little")
R> close(con)
R> str(myPoints)
 num [1:518400] 0 0 0 0 0 0 0 0 0 0 ...
R> 
Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725