Retrieve the list of files from a URL

Question

I would like to get a list of all the files available at this address: http://www1.ncdc.noaa.gov/pub/data/cmb/drought/weekly-palmers/2005/ (publicly available data from the NOAA).

It would be some sort of "list.files" for the a specific URL. I started to take a look at RCurl but all I could get was the HTML code of the URL.

all I get are plain text files, all data. how did you get the HTML code? — Pete Houston, Oct 17 '12 at 09:22
I used this (http://stackoverflow.com/questions/5227444/recursively-ftp-download-then-extract-gz-files) as a basis. — user1752610, Oct 17 '12 at 09:25

score 4 · Accepted Answer · answered Oct 17 '12 at 09:26

4

In this case you can simply use readHTMLTable:

readHTMLTable("http://www1.ncdc.noaa.gov/pub/data/cmb/drought/weekly-palmers/2005/", 
              skip.rows=1:2)[[1]]$Name -> file.list

Then to create a list of paths:

paste("http://www1.ncdc.noaa.gov/pub/data/cmb/drought/weekly-palmers/2005/", 
      file.list[!is.na(file.list)], sep="") -> path.list

answered Oct 17 '12 at 09:26

plannapus

This doesn't seem to work here: `https://vip.arizona.edu/vipdata/V4/DATAPOOL/PHENOLOGY/`. Calling: `readHTMLTable("https://vip.arizona.edu/vipdata/V4/DATAPOOL/PHENOLOGY/", skip.rows=1:2)[[1]]$Name -> file.list` returns: `Error in XML::readHTMLTable("https://vip.arizona.edu/vipdata/V4/DATAPOOL/PHENOLOGY/", : subscript out of bounds In addition: Warning message: XML content does not seem to be XML: 'https://vip.arizona.edu/vipdata/V4/DATAPOOL/PHENOLOGY/' ` – colin Sep 21 '18 at 15:13

1 Answers1