3

EDIT - Short question: Does httr have a finalizer that closes the FTP connection?

I'm downloading climate projections files from the ftp server of the NASA NEX project using the httr package.

My script is:

library(httr)

var = c("pr", "tasmin", "tasmax")
rcp = c("rcp45", "rcp85")
mod= c("inmcm4", "GFDL-CM3")
year=c(seq(2040,2080,1))

for (v in var) {
  for (r in rcp) {
    url<- paste0( 'ftp://ftp.nccs.nasa.gov/BCSD/', r, '/day/atmos/', v, '/r1i1p1/v1.0/', sep='')
    for (m in mod) {
  for (y in year) {
    nfile<- paste0(v,'_day_BCSD_',r,"_r1i1p1_",m,'_',y,'.nc', sep='')
    url1<- paste0(url,nfile, sep='')
    destfile<-paste0('mypath',r,'/',v,'/',nfile, sep='')
    GET(url=url1, authenticate(user='NEXGDDP', password='', type = "basic"), write_disk(path=destfile, overwrite = FALSE ))
    Sys.sleep(0.5)
  }}}}

After a while, the server stops my connection with the following error: "421 There are too many connections from your internet address".

I read here that this is due to the number of connections open and that I should close them at each iteration (I'm not sure this does really make sense tho!). Is there a way to close the ftp with the httr package?

Nemesi
  • 781
  • 3
  • 13
  • 29
  • 1
    That error message can also mean that you're making multiple connections too quickly, even if all of them are being closed. If you can wait a lot longer for your code to run, try adding `Sys.sleep(2)` in the loop to see if it solves the problem. – Andrew Brēza Jul 18 '17 at 12:59
  • @Andrew Brēza: It makes sense, but I left three session of R working over night downloading from the same ftp and worked just fine with `Sys.sleep(0.5)`. Now I have only one session open, and I cannot even access the ftp from my browser (same error code). – Nemesi Jul 18 '17 at 13:06
  • 2
    Have you checked if the ftp doesn't block IP after a certain amount of connexion/requests? – Colin FAY Jul 18 '17 at 13:07
  • @Colin FAY Well, the repository is done for automated download, using wget or other similar tools (https://cds.nccs.nasa.gov/nex-gddp/). I don't think they would limit the number of connections. Usually, the file transfers from this kind of repositories is done for interfacing with university/research centers' clusters. I would be very surprised. – Nemesi Jul 18 '17 at 13:12
  • in many of their applications, they specify, as for instance [here](https://www.hq.nasa.gov/office/itcd/networking-ftp.html) "Close the FTP connection and exit the FTP client application". My guess is that `httr` opens a new connection at each iteration and does not close the old one. – Nemesi Jul 18 '17 at 13:16

2 Answers2

3

Proposed Solution (Summary answer)

Proposed solution - set the maximum number of connections to the ftp server for httr

> config(CURLOPT_MAXCONNECTS=5)
<request>
Options:
* CURLOPT_MAXCONNECTS: 5

Explanation

Preamble:

The httr package is a wrapper for curl. This is important as it abstracts the curl interface. In this case, we wish to modify the curl behaviour by modifying curls configuration via the httr abstraction.

  • httr by default handles automatic connection sharing across requests to the same website (by default, curl handles are managed automatically), cookies are maintained across requests, and a up-to-date root-level SSL certificate store is used.

In this context we do not control the FTP server, only the client request to the server. Hence, we can modify curl's default behaviour via httr:config to reduce the number of simultaneous FTP requests.

Interrogate httr curl ftp options

To retrieve current options we can execute the following command:

>httr_options("ftp")
                       httr                         libcurl    type
49              ftp_account             CURLOPT_FTP_ACCOUNT  string
50  ftp_alternative_to_user CURLOPT_FTP_ALTERNATIVE_TO_USER  string
51  ftp_create_missing_dirs CURLOPT_FTP_CREATE_MISSING_DIRS integer
52           ftp_filemethod          CURLOPT_FTP_FILEMETHOD integer
53     ftp_response_timeout    CURLOPT_FTP_RESPONSE_TIMEOUT integer
54         ftp_skip_pasv_ip        CURLOPT_FTP_SKIP_PASV_IP integer
55              ftp_ssl_ccc             CURLOPT_FTP_SSL_CCC integer
56             ftp_use_eprt            CURLOPT_FTP_USE_EPRT integer
57             ftp_use_epsv            CURLOPT_FTP_USE_EPSV integer
58             ftp_use_pret            CURLOPT_FTP_USE_PRET integer
59                  ftpport                 CURLOPT_FTPPORT  string
60               ftpsslauth              CURLOPT_FTPSSLAUTH integer
196            tftp_blksize            CURLOPT_TFTP_BLKSIZE integer 

to access the libcurl documentation we can call curl_docs("CURLOPT_FTP_ACCOUNT").

Modifying httr configuration of requests

You either can modify the httr global curl configuration using set_config() or simply wrap your request using with_config(). In this instance we wish to limit the maximum number of connections to the ftp server.

thus:

httr_options("max")
                    httr                      libcurl    type
95  max_recv_speed_large CURLOPT_MAX_RECV_SPEED_LARGE  number
96  max_send_speed_large CURLOPT_MAX_SEND_SPEED_LARGE  number
97           maxconnects          CURLOPT_MAXCONNECTS integer
98           maxfilesize          CURLOPT_MAXFILESIZE integer
99     maxfilesize_large    CURLOPT_MAXFILESIZE_LARGE  number
100            maxredirs            CURLOPT_MAXREDIRS integer 

we can now look up curl_docs("CURLOPT_MAXCONNECTS") - ok this is what we want.

Now we have to set it.

> config(CURLOPT_MAXCONNECTS=5)
<request>
Options:
* CURLOPT_MAXCONNECTS: 5

ref: https://cran.r-project.org/web/packages/httr/httr.pdf


Alternate RCurl Approach

I know this is slightly superfluous, I included it to provide an alternate approach. Why? There is a subtle issue here due to network bandwidth... Running multiple simultaneous FTP sessions may be slower than running them in series. My alternate approach would be to run an R script below or go directly to using curl via the Unix shell command line.

require(RCurl)
require(stringr)
opts = curlOptions(userpwd = "NEXGDDP:", netrc = TRUE)

rcpDir  = c("rcp45", "rcp85")
varDir  = c("pr", "tasmin", "tasmax")

for (rcp in rcpDir ) {
  for (var in varDir ) {
    url <- paste0( 'ftp://ftp.nccs.nasa.gov/BCSD/', rcp, '/day/atmos/', var, '/r1i1p1/v1.0/', sep = '')
    print(url)
    filenames = getURL(url, ftp.use.epsv = FALSE, dirlistonly = TRUE, .opts = opts)
    filelist <- unlist(str_split(filenames, "\n"))
    filelist <- filelist[!filelist == ""]
    filesavg <- str_detect(filelist,
                          "inmcm4_20[4-8]0|GFDL-CM3_20[4-8]0")
    filesavg <- filelist[filesavg]
    filesavg
    urlsavg <- str_c(url, filesavg)

    for (file in seq_along(urlsavg)) {
      fname <- str_c("data/", filesavg[file])
      if (!file.exists(fname)) {
        print(urlsavg[file])
        bin <- getBinaryURL(urlsavg[file], .opts = opts)
        writeBin(bin, fname)
        Sys.sleep(1)
      }
    }
  }
}

Code Output

> require(RCurl)
> require(stringr)
> opts = curlOptions(userpwd = "NEXGDDP:", netrc = TRUE)
> rcpDir  = c("rcp45", "rcp85")
> varDir  = c("pr", "tasmin", "tasmax")
> for (rcp in rcpDir ) {
+   for (var in varDir ) {
+     url <- paste0( 'ftp://ftp.nccs.nasa.gov/BCSD/', rcp, '/day/atmos/', var, '/r1i1p1/v1.0/', sep = '')
+     print(url)
+     filenames = getURL(url, ftp.use.epsv = FALSE, dirlistonly = TRUE, .opts = opts)
+     filelist <- unlist(str_split(filenames, "\n"))
+     filelist <- filelist[!filelist == ""]
+     filesavg <- str_detect(filelist,
+                           "inmcm4_20[4-8]0|GFDL-CM3_20[4-8]0")
+     filesavg <- filelist[filesavg]
+     filesavg
+     urlsavg <- str_c(url, filesavg)
+ 
+     for (file in seq_along(urlsavg)) {
+       fname <- str_c("data/", filesavg[file])
+       if (!file.exists(fname)) {
+         print(urlsavg[file])
+         bin <- getBinaryURL(urlsavg[file], .opts = opts)
+         writeBin(bin, fname)
+         Sys.sleep(1)
+       }
+     }
+   }
+ }
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/pr/r1i1p1/v1.0/"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp45_r1i1p1_GFDL-CM3_2040.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp45_r1i1p1_GFDL-CM3_2050.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp45_r1i1p1_GFDL-CM3_2060.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp45_r1i1p1_GFDL-CM3_2070.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp45_r1i1p1_GFDL-CM3_2080.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp45_r1i1p1_inmcm4_2050.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp45_r1i1p1_inmcm4_2060.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp45_r1i1p1_inmcm4_2070.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp45_r1i1p1_inmcm4_2080.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmin/r1i1p1/v1.0/"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp45_r1i1p1_GFDL-CM3_2040.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp45_r1i1p1_GFDL-CM3_2050.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp45_r1i1p1_GFDL-CM3_2060.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp45_r1i1p1_GFDL-CM3_2070.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp45_r1i1p1_GFDL-CM3_2080.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp45_r1i1p1_inmcm4_2040.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp45_r1i1p1_inmcm4_2050.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp45_r1i1p1_inmcm4_2060.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp45_r1i1p1_inmcm4_2070.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp45_r1i1p1_inmcm4_2080.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmax/r1i1p1/v1.0/"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp45_r1i1p1_GFDL-CM3_2040.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp45_r1i1p1_GFDL-CM3_2050.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp45_r1i1p1_GFDL-CM3_2060.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp45_r1i1p1_GFDL-CM3_2070.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp45_r1i1p1_GFDL-CM3_2080.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp45_r1i1p1_inmcm4_2040.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp45_r1i1p1_inmcm4_2050.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp45_r1i1p1_inmcm4_2060.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp45_r1i1p1_inmcm4_2070.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp45_r1i1p1_inmcm4_2080.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/pr/r1i1p1/v1.0/"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp85_r1i1p1_GFDL-CM3_2040.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp85_r1i1p1_GFDL-CM3_2050.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp85_r1i1p1_GFDL-CM3_2060.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp85_r1i1p1_GFDL-CM3_2070.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp85_r1i1p1_GFDL-CM3_2080.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp85_r1i1p1_inmcm4_2040.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp85_r1i1p1_inmcm4_2050.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp85_r1i1p1_inmcm4_2060.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp85_r1i1p1_inmcm4_2070.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp85_r1i1p1_inmcm4_2080.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmin/r1i1p1/v1.0/"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp85_r1i1p1_GFDL-CM3_2040.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp85_r1i1p1_GFDL-CM3_2050.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp85_r1i1p1_GFDL-CM3_2060.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp85_r1i1p1_GFDL-CM3_2070.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp85_r1i1p1_GFDL-CM3_2080.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp85_r1i1p1_inmcm4_2040.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp85_r1i1p1_inmcm4_2050.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp85_r1i1p1_inmcm4_2060.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp85_r1i1p1_inmcm4_2070.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp85_r1i1p1_inmcm4_2080.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmax/r1i1p1/v1.0/"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp85_r1i1p1_GFDL-CM3_2040.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp85_r1i1p1_GFDL-CM3_2050.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp85_r1i1p1_GFDL-CM3_2060.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp85_r1i1p1_GFDL-CM3_2070.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp85_r1i1p1_GFDL-CM3_2080.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp85_r1i1p1_inmcm4_2040.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp85_r1i1p1_inmcm4_2050.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp85_r1i1p1_inmcm4_2060.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp85_r1i1p1_inmcm4_2070.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp85_r1i1p1_inmcm4_2080.nc"
Technophobe01
  • 8,212
  • 3
  • 32
  • 59
  • Hi Technophobe. Thanks for your trial, but this is not exactly what I was looking for. My problem is mainly to understand how I should handle the multiple connections issue with my code. It would just work perfectly if I had the possibility to state: 1 - open connection; 2 - get the file; 3 - close connection; 4 - sleep; and again. `httr` opens multiple connections at the same time, even if it is using only one. This is what puzzles me. – Nemesi Jul 25 '17 at 08:07
  • Nemesi I updated the answer to reflect your need to use `httr` and provided an explanation of how to set `httr` `curl` setting via `httr::config()`. I hope that helps point you in the right direction. – Technophobe01 Jul 25 '17 at 21:45
  • Great the possibility to configure the basic settings. I tested also the alternative approach and it does not really work as it should. A combination of my script and yours does: – Nemesi Jul 26 '17 at 12:45
  • Nemesi - glad to hear the write-up helped. Can you update your answer to show the final solution code? I'd love to see the final code. Take care and have a great day. – Technophobe01 Jul 26 '17 at 15:22
  • I put the final code in an answer integrating your key suggestions in my code. Thanks for your help again – Nemesi Jul 27 '17 at 07:13
1

(Not sure this should be an answer, but I cannot add all this in a comment)

To sum up, two alternative solutions worked combining my approach with the one proposed by Technophobe. I put the final code of both here in case it might be helpful for someone experiencing the same issues.

httr approach:

library(httr)
#configure a proxy, in case you are in a office/university network
set_config(use_proxy(url='http://~in_case_you_need_a_proxy', port=paste_here_port_no)
#limit the number of simultaneous connections as suggested by Technofobe
#default is 5
config(CURLOPT_MAXCONNECTS=3)

var = c("pr","tasmax","tasmin")
rcp = c("rcp45", "rcp85")
mod= c("inmcm4", "GFDL-CM3")
year=c(seq(2036,2050,1), seq(2061,2080,1))
for (v in var) {
  for (r in rcp) {
  url<- paste0( 'ftp://ftp.nccs.nasa.gov/BCSD/', r, '/day/atmos/', v, '/r1i1p1/v1.0/', sep='')
    for (m in mod) {
      for (y in year) {
    nfile<- paste0(v,'_day_BCSD_',r,"_r1i1p1_",m,'_',y,'.nc', sep='')
    url1<- paste0(url,nfile, sep='')
    destfile<-paste0('D:/destination_path/',r,'/',v,'/',nfile, sep='')
    GET(url=url1, authenticate(user='NEXGDDP', password='', type = "basic"), write_disk(path=destfile, overwrite = FALSE ))
    gc()
    Sys.sleep(1)
}}}}

Alternative approach using RCurl

library(RCurl)
opts = curlOptions(proxy='http://~in_case_you_need_a_proxy:paste_here_port_no', userpwd = "NEXGDDP:", netrc = TRUE)

    var = c("pr","tasmax","tasmin")
rcp = c("rcp45", "rcp85")
mod= c("inmcm4", "GFDL-CM3")
year=c(seq(2036,2050,1), seq(2061,2080,1))
for (v in var) {
  for (r in rcp) {
  url<- paste0( 'ftp://ftp.nccs.nasa.gov/BCSD/', r, '/day/atmos/', v, '/r1i1p1/v1.0/', sep='')
    for (m in mod) {
      for (y in year) {
    nfile<- paste0(v,'_day_BCSD_',r,"_r1i1p1_",m,'_',y,'.nc', sep='')
    url1<- paste0(url,nfile, sep='')
    destfile<-paste0('D:/destination_path/',r,'/',v,'/',nfile, sep='')
    bin <- getBinaryURL(url1, .opts = opts)
    writeBin(bin, destfile)
    Sys.sleep(1)
    gc()
  }}}}

Both the approaches were tested and worked. The second might still be affected by the 421 error issue, but in very limited number of occurrences (I downloaded more than 900 files for a total of about 600 GB). Hope this is a good reference for other people working in this field.

Nemesi
  • 781
  • 3
  • 13
  • 29