1

I would like to find specific tags within a Node which is in a NodeSet but when I used XPath it returns results from the whole NodeSet.

I'm trying to get something like:

{ "head1" => "Volume 1", "head2" => "Volume 2" }

from this HTML:

<h2 class="header">
  <a class="header" >head1</a>
</h2>
<table class="volume_description_header" cellspacing="0">
  <tbody>
    <tr>
      <td class="left">Volume 1</td>
    </tr>
  </tbody>
</table>
<h2 class="header">
  <a class="header" >head2</a>
</h2>
<table class="volume_description_header" cellspacing="0">
  <tbody>
    <tr>
      <td class="left">Volume 2</td>
    </tr>
  </tbody>
</table>

So far I've tried:

require 'nokogiri'
a = File.open("code-above.html") { |f| Nokogiri::HTML(f) }
h = a.xpath('//h2[@class="header"]')
puts h.map { |e| e.next.next }[0].xpath('//td[@class="left"]')

But with this I get:

<td class="left ">Volume 1</td>
<td class="left ">Volume 2</td>

I'm expecting only the first one.

I've tried doing the XPath inside the block but this gives me the the same result twice.

I checked and

puts h.map { |e| e.next.next }[0]

evaluates to the first Node so I don't understand why XPath looks in the whole NodeSet or even the whole Nokogiri::Document, as I think that's what it actually does.

Can somebody please explain me the principles of searching and navigating within a selected Node/NodeSet, not the whole Document? Maybe navigating down a known path would be better in this case but I don't know how to do that either.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Bart C
  • 1,509
  • 2
  • 16
  • 17

1 Answers1

3

Your second XPath expression, //td[@class="left"], starts with //. This means to start at the root of the entire document when matching nodes. What you want is to start from the current node. To do that start your expression with a dot .//:

d.xpath('.//td[@class="left"]')
matt
  • 78,533
  • 8
  • 163
  • 197
  • Thanks matt. I thought maybe without // would work, like in search, but it didn't. I guess this is the only way to start at the current node in xpath. – Bart C Oct 21 '15 at 14:14