I'm trying to scrapr this webpage using rvest
and httr
in R.
With the following code I made the table with all documents appear, but I need to download the pdf
produced for each line of the table.
session <- html_session(url)
form <- html_form(session)
form <- form[[1]]
res <- session %>%
submit_form(form)
I understood looking here that
WebForm_PostBackOptions("ctl00$ConteudoPagina$gdvEntidade$ctl03$lnkArquivo", "", true, "", "", false, true))
Is adding two new parameters to the POST method:
- _EVENTTARGET = 'ctl00$ConteudoPagina$gdvEntidade$ctl03$lnkArquivo'
- _EVENTARGUMENT = ''
So I added this to the form values with:
form$fields[["_EVENTTARGET"]] <- list(name = "_EVENTTARGET", value = 'ctl00$ConteudoPagina$gdvEntidade$ctl03$lnkArquivo')
form$fields[["_EVENTARGUMENT"]] <- list(name = "_EVENTARGUMENT", value = '')
And re-submitted the form:
res2 <- session %>%
submit_form(form)
But res2
was identical to res
. How should I submit the form to get the PDF?