my problem is as follows: I'm trying to write a scraper that runs through the order process of an airline ticketing website. So I want to scrape a couple of pages that depend on the results of the pages before (I hope you get what I mean). I am so far right now:
import mechanize, urllib, urllib2
url = 'any url'
br = mechanize.Browser()
br.set_handle_robots(False)
br.addheaders = [('User-agent', 'Mozilla/5.0 (Windows NT 5.2; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.47 Safari/536.11')]
br.open(url)
response = br.response().read()
br.select_form(nr=1)
br.form.set_all_readonly(False)
## now I am reading out the variables of form(nr=1)
for control in br.form.controls:
if not control.name:
print " - (type) =", (control.type)
continue
print " - (name, type, value) =", (control.name, control.type, br[control.name])
## now I am modifying the variables
br['fromdate'] = '2012/11/03'
br['todate'] = '2012/11/07'
## now I am submitting the form and saving the output in the variable bookingsite
response = br.submit()
bookingsite = response.read()
And here is my problem: How can I use the variable bookingsite, which contains again a form that I want to modify and submit, just like a normal URL? Just by setting
br.open(bookingsite)
??? Or is there another way of modifying and submitting the output (and then submit the output again and receive the new output-page)?