1

I am trying to download and open netcdf files from an open, online database called OPenDAP. When I download the datafiles directly from OPenDAP's server dataset access form, naming the file, "MUR_JPL_L4_GLOB_opendap.nc.nc4", I can download and view the data successfully in R Studio.

library("ncdf4")
GHRSST<-nc_open("MUR_JPL_L4_GLOB_opendap.nc.nc4")
print(GHRSST)
nc_close(GHRSST)

Furthermore, when I insert the data access form's Data URL directly into my browser, (e.g., "http://podaac-opendap.jpl.nasa.gov/opendap/allData/ghrsst/data/GDS2/L4/GLOB/JPL/MUR/v4.1/2009/009/20090109090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc.nc4?lat[0:1:17998],lon[0:1:35999],analysed_sst[0:1:0][0:1:17998][0:1:35999]"), naming the file, "MUR_JPL_L4_GLOB_browser.nc.nc4", I can download and view the data successfully in R Studio.

library("ncdf4")
GHRSST<-nc_open("MUR_JPL_L4_GLOB_browser.nc.nc4")
print(GHRSST)
nc_close(GHRSST)

When I try to use the download.file() function to download the data directly from the URL above within R Studio, I can successfully download the file as well.

download.file("http://podaac-opendap.jpl.nasa.gov/opendap/allData/ghrsst/data/GDS2/L4/GLOB/JPL/MUR/v4.1/2009/009/20090109090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc.nc4?lat[0:1:17998],lon[0:1:35999],analysed_sst[0:1:0][0:1:17998][0:1:35999]","MUR_JPL_L4_GLOB_rstudio.nc.nc4")

However, this data file which has been downloaded within RStudio ("MUR_JPL_L4_GLOB_rstudio.nc.nc4") cannot be opened in R Studio using the nc_open() function from the package "ncdf4." When I try to open the file with the code below, R Studio reports an "Assertion Failed" error and R Studio crashes immediately afterward.

library("ncdf4")
GHRSST<-nc_open("MUR_JPL_L4_GLOB_rstudio.nc.nc4")
ASSERTION FAILED!...

My R Studio version and ncdf4 package are up to date. I have tried the same code in Rgui with a similar error message and crash. I have also tried this on another computer with the same result and using a different downloading function such as 'download' within the "downloader" package but it also failed in the same way. I have also downloaded a small subset of the file in case there is an issue with the large file size, but this didn't help.

My questions are:

1) Why does opening the file downloaded by RStudio using the download.file() function force a crash in R Studio while the files downloaded directly by my browser function properly? 2) Do you know of any fixes that would get me past this problem?

My ultimate goal is to download and process many of these files, which is why downloading all of the data manually using my browser is not a good option.

my sessionInfo() is as follows:

R version 3.3.2 (2016-10-31) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] ncdf4_1.15

loaded via a namespace (and not attached): [1] tools_3.3.2

Thanks in advance for your help.

1 Answers1

1

I've only just seen this and have been trying to work out the same issue. I am also downloading from the PODAAC ftp server through R and was trying a loop using mapply(download.file()). I think my issue was with mapply() and somehow it wasn't building the downloaded files correctly (I also couldn't open them having downloaded through RStudio or base R but they were fine if I did it manually from the ftp).

The solution that seems to be working for me is to add in a second loop that, once you've got the file names for the individual directory (I'm downloading across several years, each with its own folder), runs download.file() for each instance.

# ftp://podaac-ftp.jpl.nasa.gov/allData/modis/L3/aqua/4um/v2014.0/4km/monthly
#monthly SST data, one folder per year
require(ncdf4)
require(RNetCDF)
require(RCurl) 

month <- c("01", "02", "03", "04", "05", "06", "07", 
       "08", "09", "10", "11", "12") #months to download
url_year <- seq(2003, 2016, 1) #years to download

for(i in 1:length(url_year)){
  url <- paste0("ftp://podaac-ftp.jpl.nasa.gov/allData/modis/L3/aqua/4um/v2014.0/4km/monthly/", url_year[i], "/")
  filenames = getURL(url, ftp.use.epsv = FALSE, dirlistonly = TRUE, crlf = TRUE) 
  filenames = paste(url, strsplit(filenames, "\r*\n")[[1]], sep = "") 
  filenamesNC = filenames[c(seq(1, 23, 2))] #subset only the netcdf files
  for(j in 1:length(filenames)){
   download.file(url = filenamesNC[j], destfile = paste0(url_year[i], "_", month[j],"_sst4_4km.nc"), mode="wb")
  }
}
Azeem
  • 11,148
  • 4
  • 27
  • 40
Jbell
  • 9
  • 2
  • Update: the url part of the loop doesn't seem to work properly. Error in download.file(url = filenamesNC[j], destfile = paste0(url_year[i], : scheme not supported in URL 'NA' Have tried to create a separate vector of URLs and call url_list[i] but not luck yet. Strange – Jbell Jul 11 '17 at 10:30