0

Is anyone keen on briefly explaining how to use the RCurl package (or any other RPackage) for downloading files from the following ftp server...?

http://hermes.acri.fr/index.php?class=ftp_access

I'm totally new in this field and some impetus is certainly needed...

Thanks a lot...

Andrew Brēza
  • 7,705
  • 3
  • 34
  • 40
Robert
  • 133
  • 10
  • What parts of the RCurl built in manual pages did not have the info you needed? Knowing that might help a contributor PR into it with better examples. Also — https://stackoverflow.com/questions/22235421/using-r-to-download-newest-files-from-ftp-server — was literally result #1 from a google search for ‘rcurl ftp download’, which seems to have quite a bit of impetus baked in. – hrbrmstr Oct 18 '18 at 11:33
  • Seems that you're not "keen on briefly explaining how to use the RCurl package ...". Thanks anyway... – Robert Oct 18 '18 at 12:04
  • The RCurl manual pages already _briefly_ explain how to use RCurl. http://www.omegahat.net/RCurl/installed/RCurl/html/getURL.html even has an example of how to use `getURL()` to perform an authenticated download. – hrbrmstr Oct 18 '18 at 12:33

1 Answers1

1

Well, tbh, you've done no research and want folks to give you special treatment, which is fine, but not going to get you very far on SO. There are tons of questions on SO for RCurl and loads of web sites that specifically talk about how to use it in the context of FTP downloads.

But, the following might help someone who has done some research and is truly stuck, plus will also show how to use the more modern curl and httr packages.

On top of some RCurl tutoring you kinda also expected folks to register for that site (since one might have assumed there were idiosyncrasies in that site's FTP server that were causing issues with RCurl…I mean, we have no context so that's as valid an assumption as any).

Put these in ~/.Renviron and restart your R session:

ACRI_FTP_USERNAME=your-username
ACRI_FTP_PASSWORD=your-password

Do some basic research (it's in the manuals on the R-Project site) on getting environment variables into R if you've not done that before.

If you don't do at least that you're putting bare credentials into scripts which is horribad for security. There are other ways to manage "secrets" more formally, but I suspect these FTP credentials aren't exactly "super-secret" bits of info. Doing this can also make any scripts more generic (i.e. others can use them if they follow the same pattern and use their own creds).

We'll use curl and httr:

library(curl)
library(httr)

You may not want to use your browser to look at directory listings and browsers may stop supporting FTP soon (Mozilla is abandoning support for reading RSS feeds and neither Chrome nor Firefox can read Gopher sites, so you never know). Browsers also tend to be super slow with FTP things for some reason.

We'll make a function to make it easier to do directory listings:

get_dir_listing <- function(path = "/") {
  curl_fetch_memory(
    paste0("ftp://ftp.hermes.acri.fr", path),
    new_handle(
      username = Sys.getenv("ACRI_FTP_USERNAME"),
      password = Sys.getenv("ACRI_FTP_PASSWORD"),
      dirlistonly=TRUE
    )
  ) -> res

  strsplit(readBin(res$content, "character"), "\n")[[1]]

}

Now we can do (we'll go down one tree and slashes matter):

get_dir_listing()
## [1] "GLOB"      "animation" "OSS2015"   "EURO"     

get_dir_listing("/GLOB/")
## [1] "meris"   "viirsn"  "merged"  "olcia"   "modis"   "seawifs"

get_dir_listing("/GLOB/meris/")
## [1] "month" "8-day" "day"  

get_dir_listing("/GLOB/meris/month/")
## [1] "2011" "2002" "2006" "2012" "2005" "2009" "2004" "2008" "2007" "2010" "2003"

get_dir_listing("/GLOB/meris/month/2011/")
## [1] "09" "05" "01" "12" "06" "02" "11" "03" "10" "07" "08" "04"

get_dir_listing("/GLOB/meris/month/2011/09/")
## [[1]] "01"

Jackpot!

get_dir_listing("/GLOB/meris/month/2011/09/01/")
##  [1] "L3b_20110901-20110930__GLOB_4_AV-MER_KD490-LEE_MO_00.nc"       
##  [2] "L3m_20110901-20110930__GLOB_25_AV-MER_ZHL_MO_00.nc"            
##  [3] "L3b_20110901-20110930__GLOB_4_AV-MER_ZSD_MO_00.nc"             
##  [4] "L3m_20110901-20110930__GLOB_100_AV-MER_ZSD_MO_00.nc"           
##  [5] "L3b_20110901-20110930__GLOB_4_AV-MER_A865_MO_00.nc"            
##  [6] "L3m_20110901-20110930__GLOB_100_AV-MER_A865_MO_00.nc"          
##  [7] "L3m_20110901-20110930__GLOB_25_AV-MER_CHL1_MO_00.png"          
##  [8] "L3m_20110901-20110930__GLOB_25_AV-MER_CF_MO_00.png"            
##  [9] "L3m_20110901-20110930__GLOB_25_AV-MER_NRRS443_MO_00.png"       
## [10] "L3m_20110901-20110930__GLOB_4_AV-MER_CHL-OC5_MO_00.nc"         
## [11] "L3m_20110901-20110930__GLOB_100_AV-MER_KDPAR_MO_00.nc"         
## [12] "L3b_20110901-20110930__GLOB_4_AV-MER_NRRS670_MO_00.nc"         
## [13] "L3m_20110901-20110930__GLOB_25_AV-MER_NRRS490_MO_00.png"       
## [14] "L3b_20110901-20110930__GLOB_4_AV-MER_NRRS412_MO_00.nc"         
## [15] "L3m_20110901-20110930__GLOB_4_AV-MER_A865_MO_00.nc"            
## [16] "L3m_20110901-20110930__GLOB_4_AV-MER_NRRS490_MO_00.nc"         
## [17] "L3m_20110901-20110930__GLOB_25_AV-MER_KD490_MO_00.png"         
## [18] "L3m_20110901-20110930__GLOB_4_GSM-MER_CHL1_MO_00.nc"           
## [19] "L3b_20110901-20110930__GLOB_4_AV-MER_T550_MO_00.nc"            
## [20] "L3m_20110901-20110930__GLOB_25_AV-MER_CHL-OC5_MO_00.png"       
## [21] "L3m_20110901-20110930__GLOB_25_AV-MER_ZSD-DORON_MO_00.nc"  
## .. there are alot of them

Now you probably want to download one of them. I know .nc files are generally huge even though I never have to use them b/c I've read and answered alot of SO questions about them.

We'll use httr for the download as it takes care of a bunch of things for us:

httr::GET(
  url = "ftp://ftp.hermes.acri.fr/GLOB/meris/month/2011/09/01/L3m_20110901-20110930__GLOB_4_GSM-MER_CHL1_MO_00.nc",
  httr::authenticate(Sys.getenv("ACRI_FTP_USERNAME"), Sys.getenv("ACRI_FTP_PASSWORD")),
  httr::write_disk("~/Data/L3m_20110901-20110930__GLOB_4_GSM-MER_CHL1_MO_00.nc"),
  httr::progress()
) -> res

httr::stop_for_status(res)

You can safely ignore the warnings and diagnostics:

## Warning messages:
## 1: In parse_http_status(lines[[1]]) :
##   NAs introduced by coercion to integer range
## 2: Failed to parse headers:
## 229 Entering Extended Passive Mode (|||28926|)
## 200 Type set to I
## 213 92373747
## 150 Opening BINARY mode data connection for L3m_20110901-20110930__GLOB_4_GSM-MER_CHL1_MO_00.nc (92373747 bytes)
## 226 Transfer complete

Because it has the proper magic headers for the file command:

$ file L3m_20110901-20110930__GLOB_4_GSM-MER_CHL1_MO_00.nc
L3m_20110901-20110930__GLOB_4_GSM-MER_CHL1_MO_00.nc: Hierarchical Data Format (version 5) data

Hopefully this did help out someone truly stuck since there are (as stated) loads of content on SO and elsewhere about how to authenticate to FTP servers, perform directory traversals and download content. This is now one more added to that corpus.

hrbrmstr
  • 77,368
  • 11
  • 139
  • 205
  • Well, this is certainly what I was dreaming of - truly a master piece of an answer. Thanks a lot @hrbrmstr - that made my day... – Robert Oct 18 '18 at 12:57
  • If your department uses that site alot, drop a comment and I can wrap it into a package for easier use. – hrbrmstr Oct 18 '18 at 12:59