Scraping pdf files from web

Question

This question was answered here (Web scraping pdf files from HTML) but the solution doesn't work for me on either my target url or the target url of the op. I'm not supposed to ask this question as an answer to the earlier post so I'm starting a new Q.

My code is exactly as per the op and the error message that I receive is "Error in download.file(links[i], destfile = save_names[i]) : invalid 'url' argument"

The code I'm using is:

install.packages("RCurl")
install.packages("XML")
library(XML)
library(RCurl)
url <- "https://www.bot.or.th/English/MonetaryPolicy/Northern/EconomicReport/Pages/Releass_Economic_north.aspx"
page   <- getURL(url)
parsed <- htmlParse(page)
links  <- xpathSApply(parsed, path="//a", xmlGetAttr, "href")
inds   <- grep("*.pdf", links)
links  <- links[inds]


regex_match <- regexpr("[^/]+$", links)
save_names <- regmatches(links, regex_match)

for(i in seq_along(links)){
  download.file(links[i], destfile=save_names[i])
  Sys.sleep(runif(1, 1, 5))

}

Any help much appreciated! Thanks

Would be cool put in the begin of the script the packages...so you turn this a reproducible example. — igorkf, Feb 28 '19 at 12:43
Solved! I don't know *why* this works but it does. I have swapped the for loop for the following code and it works: Map (function(u, d) download.file(u, d, mode='wb'), links, save_names) — IanLux, Feb 28 '19 at 15:28

score 0 · Answer 1 · answered Feb 28 '19 at 15:28

0

Solved! I don't know why this works but it does. I have swapped the for loop for the following code and it works:

Map (function(u, d) download.file(u, d, mode='wb'), links, save_names)

answered Feb 28 '19 at 15:28

IanLux

13
5

Scraping pdf files from web

1 Answers1