3

I want to click a link with Mechanize that I select with xpath (nokogiri).

How is that possible?

    next_page = page.search "//div[@class='grid-dataset-pager']/span[@class='currentPage']/following-sibling::a[starts-with(@class, 'page')][1]"
    next_page.click

The problem is that nokogiri element doesn't have click function.

I can't read the href (URL) and send get request because the link has onclick function defined (no href attribute).

If that's not possible, what are the alternatives?

Phrogz
  • 296,393
  • 112
  • 651
  • 745
all jazz
  • 2,007
  • 2
  • 21
  • 37

3 Answers3

11
  1. Use page.at instead of page.search when you're trying to find only one element.

  2. You can make your selector simpler (shorter) by using CSS selector syntax:

    next_page = page.at('div.grid-dataset-pager > span.currentPage + a[class^="page"]')
    
  3. You can construct your own Link instance if you have the Nokogiri element, page, and mechanize object to feed the constructor:

    next_link = Mechanize::Page::Link.new( next_page, mech, page )
    next_link.click
    
  4. However, you might not need that, because Mechanize#click lets you supply a string with the text of the anchor/button to click on.

    # Assuming this link text is unique on the page, which I suspect it is
    mech.click next_page.text
    
  5. Edit after re-reading the question completely: However, none of this is going to help you, because Mechanize is not a web browser! It does not have a JavaScript engine, and thus won't (can't) execute your onclick for you. For this you'll need to use Ruby to control a real web browser, e.g. using Watir or Selenium or Celerity or the like.

Phrogz
  • 296,393
  • 112
  • 651
  • 745
3

In general you would do:

page.link_with(:node => next_link).click

However like Phrogz says, this won't really do what you want.

pguardiario
  • 53,827
  • 19
  • 119
  • 159
0

Why don't you use a hpricot element instead? Mechanize can click on a hpricot element as long as the link has a 'src' or 'href' attribute. Try something along these lines:

page = agent.get("http://www.example.com")
next_page = agent.click((page/"//your/xpath/a"))

Edit After reading Phrogz answer I also realized that this won't really do it. Mechanize doesn't support Javascript yet. With this in mind you have 3 options.

  1. Use a library that controls a real web browser. See @Phrogz answer.
  2. Use Capybara which is an integration testing library but can also be used as a stand alone crawler. I've done this successfully with HTMLUnit which is a also an integration testing library in Java. Capybara comes with Selenium support by default though it also supports Webkit via an external gem. Capybara interprets Javascript out of the box. This blog post might help.
  3. Grok the page that you intend to crawl and use something like HTTPFox to monitor what the onclick Javascript function does and replicate this in your Mechanize script.

Good luck.

Kibet Yegon
  • 2,763
  • 2
  • 25
  • 32