Is it possible to find the .. text, when any of the .. value is known?

Question

I have an webpage which has the similar kind of html format as below:

<form name="test">

<td> .... </td>
  .
  .
  .
<td> <A HREF="http://www.edu/st/file.html">alo</A> </td>
<td> <A HREF="http://www.dom/st/file.html">foo</A> </td>
<td> bla bla </td>

</form>

Now, I know only the value bla bla, base on the value can we track or find the 3rd last .. value(which is here alo)? I can track those,with the help of HREF values,but the HREF values are not fixed always, they can be anything anytime.

Again, your HTML isn't real helpful. Is there a `` wrapping each `` or are they all embedded within one ``? Where are the `` opening and closing tags? There are innumerable questions on SO for parsing HTML tables using Nokogiri. — the Tin Man, Jan 22 '13 at 21:16
@theTinMan All are in one . But I am not using `nokogiri` instead using `mechanize`. — Arup Rakshit, Jan 22 '13 at 21:21

score 1 · Answer 1 · answered Jan 22 '13 at 20:15

1

see http://nokogiri.org/

it helps you to parse html code and then find the elements via selectors

answered Jan 22 '13 at 20:15

BvuRVKyUVlViVIc7

11,641
9
59
111

I am using `mechanize`. So I couldn't use `nokogiri`. I am looking for a `mechanize` solution for the same. – Arup Rakshit Jan 22 '13 at 20:18
1

mechanize is using nokogiri for parsing.. so you can take these selectors – BvuRVKyUVlViVIc7 Jan 22 '13 at 20:20
Okay! I am not that much aware of nokogiri! can you help me to give a sample code with my example? – Arup Rakshit Jan 22 '13 at 20:22
How would I use `mechanize` page or form object with `nokogiri`? – Arup Rakshit Jan 22 '13 at 20:38
@PythonLikeYOU Use `page.parser` to get the nokogiri parser, then manipulate it with all nokogiri functions. It is the same as `Nokogiri::HTML(page.body)`. – Guilherme Bernal Jan 28 '13 at 17:03

score 1 · Accepted Answer · answered Jan 22 '13 at 21:26

1

Extracting every <td> from an HTML document is easy, but it's not a foolproof way to navigate the DOM. However, given the limitations of the sample HTML, here's a solution. I doubt it'll work in a real-world situation though.

Mechanize uses Nokogiri internally for its heavy lifting so doing require 'nokogiri' isn't necessary if you've already required Mechanize.

require 'nokogiri'

doc = Nokogiri::HTML::DocumentFragment.parse(<<EOT)
<td> <A HREF="http://www.edu/st/file.html">alo</A> </td>
<td> <A HREF="http://www.dom/st/file.html">foo</A> </td>
<td> bla bla </td>
EOT

doc.search('td')[-3].at('a')['href']
=> "http://www.edu/st/file.html"

How to get the Nokogiri document from the Mechanize "agent" is left as an exercise for the user.

answered Jan 22 '13 at 21:26

the Tin Man

158,662
42
215
303

Thank you sir to help me here. But I am using mechanize.So would like to know how i use mechanize `page` or `form` object with `nokogiri`? – Arup Rakshit Jan 22 '13 at 21:28
You'll need to search for the answer. The Nokogiri docs don't mention anything about Mechanize. And, as @lichtamberg and I said, Mechanize uses Nokogiri, so you *ARE* using Nokogiri and have it available to you. – the Tin Man Jan 22 '13 at 21:30
I found out which is `doc = Nokogiri::HTML::DocumentFragment.parse(agent.current_page().body)` :) – Arup Rakshit Jan 22 '13 at 21:52
Wrong, because you don't need to reparse anything. It's already a Nokogiri document when Mechanize hands it off. Search on Stack Overflow. There are many questions about it. – the Tin Man Jan 22 '13 at 22:36

Is it possible to find the .. text, when any of the .. value is known?

2 Answers2

Linked