0

This question was answered here (Web scraping pdf files from HTML) but the solution doesn't work for me on either my target url or the target url of the op. I'm not supposed to ask this question as an answer to the earlier post so I'm starting a new Q.

My code is exactly as per the op and the error message that I receive is "Error in download.file(links[i], destfile = save_names[i]) : invalid 'url' argument"

The code I'm using is:

install.packages("RCurl")
install.packages("XML")
library(XML)
library(RCurl)
url <- "https://www.bot.or.th/English/MonetaryPolicy/Northern/EconomicReport/Pages/Releass_Economic_north.aspx"
page   <- getURL(url)
parsed <- htmlParse(page)
links  <- xpathSApply(parsed, path="//a", xmlGetAttr, "href")
inds   <- grep("*.pdf", links)
links  <- links[inds]


regex_match <- regexpr("[^/]+$", links)
save_names <- regmatches(links, regex_match)

for(i in seq_along(links)){
  download.file(links[i], destfile=save_names[i])
  Sys.sleep(runif(1, 1, 5))

}

Any help much appreciated! Thanks

IanLux
  • 13
  • 5
  • 1
    Would be cool put in the begin of the script the packages...so you turn this a reproducible example. – igorkf Feb 28 '19 at 12:43
  • 1
    Done as suggested @igorkf – IanLux Feb 28 '19 at 14:34
  • Solved! I don't know *why* this works but it does. I have swapped the for loop for the following code and it works: Map (function(u, d) download.file(u, d, mode='wb'), links, save_names) – IanLux Feb 28 '19 at 15:28

1 Answers1

0

Solved! I don't know why this works but it does. I have swapped the for loop for the following code and it works:

Map (function(u, d) download.file(u, d, mode='wb'), links, save_names) 
IanLux
  • 13
  • 5