0

I am trying to get controls information of a form on a website (http://www.proxy-listen.de/Proxy/Proxyliste.html). (Actually I want to fill the form, submit it, and get a list of proxy servers). I am using this code to read out the form elements:

 for control in br.form.controls:
     if not control.name:
         print " - (type) =", (control.type)
         continue  
     print " - (name, type, value) =", (control.name, control.type, br[control.name])

For some reason, mechanize does not list the radio buttons (namely 'type' and 'liststyle', what I found out through sourcecode), hence when I am submitting the form I get mysteriously back to the mainpage (http://www.proxy-listen.de). Here is my complete code:

 #!/usr/bin/env python
 #-*- coding: utf-8 -*-

 import mechanize

 br = mechanize.Browser()
 br.addheaders = [("Content-type", "text/html, charset=utf-8")]
 br.addheaders = [('User-agent', 'Mozilla/5.0 (Windows NT 5.2; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.47 Safari/536.11')]
 br.set_handle_robots(False)

 url = 'http://www.proxy-listen.de/Proxy/Proxyliste.html'
 br.open(url)
 response1 = br.response()
 print response1

 ## Show me the forms of the website!
 for form in br.forms():
     print form

 ## Select form Nr 0, as this is the one and only form I am looking for
 br.select_form(nr=0)
 br.form.set_all_readonly(False)

 ## Fill the form
 br.form['filter_country'] = ['DE']
 br.form['filter_http_anon'] = ['3']
 br.form['filter_http_gateway'] = ['']
 br.form['filter_port'] = ['']
 br.form['filter_response_time_http'] = ['1']
 br.form['filter_timeouts1'] = ['10']
 br.form['proxies'] = ['300']


 ## I already tried adding two radio buttons manually, but even this doesnt help!
 '''
 br.form.new_control('radio','liststyle',{'value': ['info', 'leech']})
 br.form.new_control('radio','type',{'value':['http', 'https', 'socks4', 'socks5']})
 br.form.fixup()
 br.form['liststyle'] = ['leech']
 br.form['type'] = ['http']
 '''

 ## just to double-check the values, loop through controls!
 for control in br.form.controls:
     if not control.name:
         print " - (type) =", (control.type)
         continue  
     print " - (name, type, value) =", (control.name, control.type, br[control.name])

 resp2 = br.submit()
 page = resp2.read()
 print page

I already checked the variables being sent with POST-request with firebug and the radio buttons are not surprisingly part of the POST-request.

Any pointers are highly appreciated! Thank you very much.

julianschnell
  • 193
  • 1
  • 2
  • 8
  • it's probably javascript generating the buttons on the fly and mechanize does not support that so won't be able to see the final DOM that JS renders. – Paul Collingwood Nov 21 '12 at 12:15
  • hhm..I tried it with lxml as well, and this works. I think lxml isn't able to handle JS as well... strange. – julianschnell Nov 21 '12 at 15:09

0 Answers0