Python & mechanize: How to scrape through pages in a row?

Question

my problem is as follows: I'm trying to write a scraper that runs through the order process of an airline ticketing website. So I want to scrape a couple of pages that depend on the results of the pages before (I hope you get what I mean). I am so far right now:

    import mechanize, urllib, urllib2

    url = 'any url'
    br = mechanize.Browser()
    br.set_handle_robots(False)
    br.addheaders = [('User-agent', 'Mozilla/5.0 (Windows NT 5.2; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.47 Safari/536.11')]
    br.open(url)
    response = br.response().read()

    br.select_form(nr=1)
    br.form.set_all_readonly(False)

    ## now I am reading out the variables of form(nr=1)

    for control in br.form.controls:
           if not control.name:
               print " - (type) =", (control.type)
               continue  
           print " - (name, type, value) =", (control.name, control.type, br[control.name])

    ## now I am modifying the variables
    br['fromdate'] = '2012/11/03'
    br['todate'] = '2012/11/07'

    ## now I am submitting the form and saving the output in the variable bookingsite
    response = br.submit()
    bookingsite = response.read()

And here is my problem: How can I use the variable bookingsite, which contains again a form that I want to modify and submit, just like a normal URL? Just by setting

    br.open(bookingsite)

??? Or is there another way of modifying and submitting the output (and then submit the output again and receive the new output-page)?

Extracting data from `bookingsite` should not be a problem, but I don't understand why you say that `bookingsite` would include a "form". It should just include the HTML response to the form that you submitted (i.e. `br.submit()`) Could you clarifyt? — David, Nov 12 '12 at 23:10
Hey david, thanks for your reply! Here is the explanation: the first site contains a form, where you put in details such as departure- & arrival dates, airports etc., then you click submit and you'll come to a second page (bookingsite) where you see all planes flying on the specific dates to your selected airport. on this second page you need to select a specific flight (now this selection happens in another form, in this case there are radio buttons for each flight/airplane). now I need to select one and submit this second page again to get to the third page. — julianschnell, Nov 15 '12 at 14:18
Ok - then you should be able to call `response.select_form()` and set the radio buttons, followed by `response.submit()`. — David, Nov 15 '12 at 18:40

score 0 · Answer 1 · answered Nov 08 '13 at 14:50

After your initial response response = br.submit() Select the form from the response object:

response.select_form()

After you're changed the values of the fields within the form submit the form:

response.submit()

P.S. If you're automating booking sites they most likely have heavy Javascript. Mechanize doesn't handle Javascript. I'd suggest using Requests instead. You'll be happy you did.

Python & mechanize: How to scrape through pages in a row?

1 Answers1