0

I'm trying to scrapr this webpage using rvest and httr in R.

With the following code I made the table with all documents appear, but I need to download the pdf produced for each line of the table.

session <- html_session(url)
form <- html_form(session)
form <- form[[1]]
res <- session %>%
  submit_form(form)

I understood looking here that

WebForm_PostBackOptions("ctl00$ConteudoPagina$gdvEntidade$ctl03$lnkArquivo", "", true, "", "", false, true))

Is adding two new parameters to the POST method:

  • _EVENTTARGET = 'ctl00$ConteudoPagina$gdvEntidade$ctl03$lnkArquivo'
  • _EVENTARGUMENT = ''

So I added this to the form values with:

form$fields[["_EVENTTARGET"]] <- list(name = "_EVENTTARGET", value = 'ctl00$ConteudoPagina$gdvEntidade$ctl03$lnkArquivo')
form$fields[["_EVENTARGUMENT"]] <- list(name = "_EVENTARGUMENT", value = '')

And re-submitted the form:

res2 <- session %>%
  submit_form(form)

But res2 was identical to res. How should I submit the form to get the PDF?

Daniel Falbel
  • 1,721
  • 1
  • 21
  • 41

0 Answers0