0

I am crawling data from http://www.mca.gov.in/DCAPortalWeb/dca/MyMCALogin.do?method=setDefaultProperty&mode=53

Below is the code I have tried :

uri = "http://www.mca.gov.in/DCAPortalWeb/dca/MyMCALogin.do?method=setDefaultProperty&mode=53"
    #html, html_content = @mobj.get_data(uri)

    agent = Mechanize.new 
    html_page  = agent.get uri
    html_form = html_page.form 
    html_form.radiobuttons_with(:name => 'search',:value => '2')[0].check
    html_form.submit
    puts html_page.content

Error :

var/lib/gems/1.9.1/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:308:in `fetch': 500 => Net::HTTPInternalServerError for http://www.mca.gov.in/DCAPortalWeb/dca/ProsecutionDetailsSRAction.do -- unhandled response (Mechanize::ResponseCodeError)
from /var/lib/gems/1.9.1/gems/mechanize-2.7.3/lib/mechanize.rb:1281:in `post_form'
from /var/lib/gems/1.9.1/gems/mechanize-2.7.3/lib/mechanize.rb:548:in `submit'
from /var/lib/gems/1.9.1/gems/mechanize-2.7.3/lib/mechanize/form.rb:223:in `submit'
from ministry_corp_aff.rb:32:in `start'
from ministry_corp_aff.rb:52:in `<main>'

If I manually click on the 3rd radio button and then submit it, I get a .zip file. I was trying to fetch data from the .xls file from that zip..

matthias_h
  • 11,356
  • 9
  • 22
  • 40

1 Answers1

0

The radio button has an onclick even handler that triggers the execution of some javascript. In addition, clicking on the Submit <a> tag also causes some javascript to execute. That javascript probably sets some values that are returned with the form, which the server examines.

Mechanize cannot execute the javascript. You need selenium webdriver for that.

7stud
  • 46,922
  • 14
  • 101
  • 127