0

Im using mechanize to scrape a few pages. Pagination is implemented by a javascript post, and the pagination links are actually input buttons. These are not included in a form. Any idea how I could trigger a click on these?

Im still working on the script, and can use either mechanize-ruby or mechanize-python. A solution in either would help.

The html of the buttons is:

<input name="px" value="1" class="pSel" disabled="true\" type="button">
<input name="px" value="2" class="page_select" onclick="apply_pagination(this);" type="button">
<input name="px" value="3" class="page_select" onclick="apply_pagination(this);" type="button">
...
<input name="px" value="10" class="page_select" onclick="apply_pagination(this);" type="button">
zsquare
  • 9,916
  • 6
  • 53
  • 87

1 Answers1

1

With mechanize-ruby you could find out the POST parameters by:

  1. look at the apply_pagination javascript method and figure out what it posts back to the web server.

  2. Click on one of the links on a browser and monitor the POST request using something like [HTTPFox] (https://addons.mozilla.org/en-us/firefox/addon/httpfox/).

With that you can easily replicate what the web server expects from the user agent and do something close to this;

next_page = agent.post("http://example.com/", { "foo" => "bar" })
Kibet Yegon
  • 2,763
  • 2
  • 25
  • 32