click on xpath link with Mechanize

Question

I want to click a link with Mechanize that I select with xpath (nokogiri).

How is that possible?

    next_page = page.search "//div[@class='grid-dataset-pager']/span[@class='currentPage']/following-sibling::a[starts-with(@class, 'page')][1]"
    next_page.click

The problem is that nokogiri element doesn't have click function.

I can't read the href (URL) and send get request because the link has onclick function defined (no href attribute).

If that's not possible, what are the alternatives?

Phrogz · Accepted Answer · 2012-07-25T22:33:08.507

Use page.at instead of page.search when you're trying to find only one element.

You can make your selector simpler (shorter) by using CSS selector syntax:

next_page = page.at('div.grid-dataset-pager > span.currentPage + a[class^="page"]')

You can construct your own Link instance if you have the Nokogiri element, page, and mechanize object to feed the constructor:
```
next_link = Mechanize::Page::Link.new( next_page, mech, page )
next_link.click
```
However, you might not need that, because Mechanize#click lets you supply a string with the text of the anchor/button to click on.
```
# Assuming this link text is unique on the page, which I suspect it is
mech.click next_page.text
```
Edit after re-reading the question completely: However, none of this is going to help you, because Mechanize is not a web browser! It does not have a JavaScript engine, and thus won't (can't) execute your onclick for you. For this you'll need to use Ruby to control a real web browser, e.g. using Watir or Selenium or Celerity or the like.

score 3 · Answer 2 · answered Jul 26 '12 at 01:48

3

In general you would do:

page.link_with(:node => next_link).click

However like Phrogz says, this won't really do what you want.

answered Jul 26 '12 at 01:48

pguardiario

53,827
19
119
159

Kibet Yegon · Answer 3 · 2012-07-26T07:21:30.410

Why don't you use a hpricot element instead? Mechanize can click on a hpricot element as long as the link has a 'src' or 'href' attribute. Try something along these lines:

page = agent.get("http://www.example.com")
next_page = agent.click((page/"//your/xpath/a"))

Edit After reading Phrogz answer I also realized that this won't really do it. Mechanize doesn't support Javascript yet. With this in mind you have 3 options.

Use a library that controls a real web browser. See @Phrogz answer.
Use Capybara which is an integration testing library but can also be used as a stand alone crawler. I've done this successfully with HTMLUnit which is a also an integration testing library in Java. Capybara comes with Selenium support by default though it also supports Webkit via an external gem. Capybara interprets Javascript out of the box. This blog post might help.
Grok the page that you intend to crawl and use something like HTTPFox to monitor what the onclick Javascript function does and replicate this in your Mechanize script.

Good luck.

hpricot is *so* three years ago. – pguardiario Jul 26 '12 at 01:49 — pguardiario, Jul 26 '12 at 01:49

click on xpath link with Mechanize

3 Answers3