Total noob, obviously. Teaching self Python for web scraping in the interest of open records/government transparency/reporting/etc.
There's an .aspx page I want to scrape, a week-by-week calendar for January - March 2012
But it has no forms ...
Perhaps you fine people can tell me if a solution is even possible before I spend days fighting with it.
http://webmail.legis.ga.gov/Calendar/default.aspx?chamber=house
The only way to see the appointments on the calendar is by choosing a day on a picture of a calendar. But, at least, if you click on Monday, it shows all the week's appointments. (I would like to gather all those appointments in order to count how often each committee meets, a bit of a proxy for counting what kind of legislation gets attention and what kind is ignored.)
But so, what strategy to use? It appears that each month at least down in its bowels is assigned to a sequential four-digit number prepended with a "V", like V4414, and days with a non-prepended number.
I'm only hunting Jan - Mar 2012; other months are non-germane and mostly empty.
a clue?
...<a href="javascript:__doPostBack('calMain','V4414')" style="color:#333333" title="Go to the previous month">February</a></td><td align="center" style="width:70%;">March 2012</td><td align="right" valign="bottom" style="color:#333333;font-size:8pt;font-weight:bold;width:15%;"><a href="javascript:__doPostBack('calMain','V4474')" style="color:#333333" title="Go to the next month">April</a></td></tr>
a pattern?
...<td align="center" style="color:#999999;width:14%;"><a href="javascript:__doPostBack('calMain','4439')" style="color:#999999" title="February 26">26</a></td><td align="center" style="color:#999999;width:14%;"><a href="javascript:__doPostBack('calMain','4440')" style="color:#999999" title="February 27">27</a></td><td align="center" style="color:#999999;width:14%;"><a href="javascript:__doPostBack('calMain','4441')" style="color:#999999" title="February 28">28</a></td>...
Cheers and thanks!!