I am trying to webscrape a table from an interactive aspx webpage. I've read all of the R webscraping questions on stack and I think I am getting close, but I can't quite seem to get it.
I would like to pull data from the tables produced here. Eventually I'd like to loop through each date period and state option, but my challenge is really just getting to R to submit my parameters and pull in the resulting table for any particular query.
From what I gather, the answer likely involves RCurl and XML packages, posting a 'form' with my parameters and then reading in the html of the resulting page.
My most recent effort looks like this:
library(RCurl)
library(XML)
curl = getCurlHandle()
link = "http://indiawater.gov.in/IMISReports/Reports/WaterQuality/rpt_WQM_HabitationWiseLabTesting_S.aspx"
html = getURL(link, curl = curl)
params = list('ctl00$ContentPlaceHolder$ddFinYear' = '2005-2006',
'ctl00$ContentPlaceHolder$ddState' = 'BIHAR')
html2 = postForm(link, .params = params, curl = curl)
table = readHTMLTable(html2 )
It's hard for me to really tell at what point I've encountered a problem. On the one hand html == html2 produces false, so I think that html2 has progressed to some point after submitting the form, but its still not clear to me if the form has been submitted incorrectly or if that worked and its the reading in of the table that's not working.
Any suggestion and help are appreciated. Thanks!