0

I have a website that I am attempting to scrape using Mechanize. When I submit the form, the form is submitted with an URL of the following format : https://www.website.com/Login/Options?returnURL=some_form_options (If I enter that URL in the browser, it will send me to a nice error page saying that the requested page does not exist)

Whereas, if I submit the form from the website, the returned URL will be of the following format : https://www.website.com/topic/country/list_of_form_options

The website has a login form that is not necessary to fill in to be able to submit a search query.

Any idea why I would get a different URL submitting the same form with Mechanize ? And how to counter that ? I cannot process the URL I get after "mechanizing" the form.

Thanks!

harvey
  • 299
  • 1
  • 8

1 Answers1

0

You can find the exact form that you want to submit then submit, If you are unable to find the path then Even you can add form field using Mechanize and submit that form. Here is my code that i have used in my project.

I had create a rake task for this task:

namespace :test_namespace do
    task :mytask => [:environment] do
      site = "http://www.website.com/search/search.aspx?term=search term"
      # prepare user agent
      ua = Mechanize.new
      page = ua.get("#{site}")
      while (true)
        page.search("//div[@class='resultsNoBackground']").each do |res|
          puts res.at("table").at('tr').at('td').text
          link_text =res.at_css('strong').at('a').text
          link_href = res.at_css('strong').at('a')['href']
          link_href ="http://www.website.com"+link_href
          page_content=''
          res.css('span').each do |ss|
            ss.css('strong').remove
            page_content=ss.text.gsub(/Vi.*s\)/, '')
          end
          # puts "HERE IS THE SUMMMER ......#{content_summery}"
         end

        if page.search("#ctl00_ContentPlaceHolder1_ctrlResults_gvResults_ctl01_lbNext").count > 0
          form = page.forms.first
          form.add_field! "__EVENTTARGET", "ctl00$ContentPlaceHolder1$ctrlResults$gvResults$ctl01$lbNext"
          form.add_field! "__EVENTARGUMENT", ""
          page = form.submit
        else
          break
        end
      end
    end
end
Raza Hussain
  • 762
  • 8
  • 18