4

I'm trying to find all the tables below my current node without also including the nested tables. In other words, if I have this, i want to find "yes" and not "no":

<table> <!-- outer table - no -->
  <tr><td>
    <div> <!-- *** context node *** -->
      <table> <!-- yes -->
        <tr><td>
          <table> ... </table> <!-- no -->
        </td></tr>
      </table>
      <table> <!-- yes -->
        <tr><td>
          <table> ... </table> <!-- no -->
        </td></tr>
      </table>
    </div>
  </td></tr>
</table>

Is there any easy way to do this in XPath 1.0? (In 2.0, it'd be .//table except .//table//table, but I don't have a 2.0 as an option.)

EDIT: please, the answers so far are not respecting the idea of current context node. I don't know how far down the first layer of table might be (and it might differ), and I also don't know if I might be inside another table (or two or three).

Literally, I want what .//table except .//table//table in XPath 2.0 would be, but I have only XPath 1.

Randal Schwartz
  • 39,428
  • 4
  • 43
  • 70
  • I think that it is impossible write only one XPath 1.0 because I need use one context multiple times and it is not allowed. Can I use two XPaths, one XPath for getting value of variable and 2nd XPath for getting required tables? – Gaim Jan 13 '10 at 17:56
  • You've made this a CW for what reason? Its quite a tricky question which will have a right answer, its not a candidate for CW. – AnthonyWJones Jan 14 '10 at 14:59
  • What's a "CW"? And who are you addressing as "you" there? Me? How did I make this a "CW"? :) – Randal Schwartz Jan 14 '10 at 16:33
  • CW is "Community Wiki" - at the most basic level it makes your question a: more editable by other people, and b: you don't gain any rep points from it; but in reality it tends to imply it is a more discussion-based question. You did this by clicking a checkbox, and there is no "undo" - but ultimately if you get a suitable answer this doesn't make a huge amount of difference. – Marc Gravell Jan 15 '10 at 12:41

5 Answers5

0

I think you want child::table aka table

#!/usr/bin/perl --
use strict;
use warnings;

use HTML::TreeBuilder;
{
  my $tree = HTML::TreeBuilder->new();

  $tree->parse(<<'__HTML__');
<table> <!-- outer table - no -->
  <tr><td>
    <div> <!-- *** context node *** -->
      <table> <!-- yes -->
        <tr><td>
          <table> ... </table> <!-- no -->
        </td></tr>
      </table>
      <table> <!-- yes -->
        <tr><td>
          <table> ... </table> <!-- no -->
        </td></tr>
      </table>
    </div>
  </td></tr>
</table>
__HTML__

  sub HTML::Element::addressx {
    return join(
      '/',
      '/', # // ROOT
      reverse(    # so it starts at the top
        map {
          my $n = $_->pindex() || '0';
          my $t = $_->tag;
          $t . '['. $n .']'
          }         # so that root's undef -> '0'
          $_[0],    # self and...
        $_[0]->lineage
      )
    );
  } ## end sub HTML::Element::addressx

  for my $td ( $tree->look_down( _tag => qr/div|table/i ) ) {
    print $td->addressx, "\n";
  }
  $tree->delete;
  undef $tree;
}
__END__
//html[0]/body[1]/table[0]
//html[0]/body[1]/table[0]/tr[0]/td[0]/div[0]
//html[0]/body[1]/table[0]/tr[0]/td[0]/div[0]/table[0]
//html[0]/body[1]/table[0]/tr[0]/td[0]/div[0]/table[0]/tr[0]/td[0]/table[0]
//html[0]/body[1]/table[0]/tr[0]/td[0]/div[0]/table[1]
//html[0]/body[1]/table[0]/tr[0]/td[0]/div[0]/table[1]/tr[0]/td[0]/table[0]

and second part

#!/usr/bin/perl --

use strict;
use warnings;

use HTML::TreeBuilder::XPath;

my $tree = HTML::TreeBuilder::XPath->new;
$tree->parse_content(<<'__HTML__');
<table> <!-- outer table - no -->
  <tr><td>
    <div> <!-- *** context node *** -->
      <table> <!-- yes -->
        <tr><td>
          <table> ... </table> <!-- no -->
        </td></tr>
      </table>
      <table> <!-- yes -->
        <tr><td>
          <table> ... </table> <!-- no -->
        </td></tr>
      </table>
    </div>
  </td></tr>
</table>
__HTML__



#~ for my $result ($tree->findnodes(q{//html[0]/body[1]/table[0]/tr[0]/td[0]/div[0]})) {
for my $result ($tree->findnodes(q{/html/body/table/tr/td/div})) {
    print $result->as_HTML,"\n\n";
    for my $table( $result->findnodes(q{table}) ){ ## child::table
        print "$table\n";
        print $table->as_HTML,"\n\n\n";
    }

}

__END__
<div><table><tr><td><table><tr><td> ... </td></tr></table></td></tr></table><table><tr><td><table><tr><td> ... </td></tr></table></td></tr></table></div>


HTML::Element=HASH(0xc6c964)
<table><tr><td><table><tr><td> ... </td></tr></table></td></tr></table>



HTML::Element=HASH(0xc6cbf4)
<table><tr><td><table><tr><td> ... </td></tr></table></td></tr></table>
ricky
  • 1
0

Well, if I understand it, the content_list can solve:

my $table_one = $tree->findnodes('/html//table')->[1];

for ( $table_one->content_list ) {
    last if $_->exists('table');
    print $_->as_text;
}   

:)

Mantovani
  • 500
  • 2
  • 7
  • 18
0

What about .//table[not(.//table)]? Sorry for brevity, I'm on my phone.

Dominic Mitchell
  • 11,861
  • 4
  • 29
  • 30
  • Nope, that finds all tables that don't have tables in them. I want all tables that are not within tables. – Randal Schwartz Jan 13 '10 at 19:55
  • OK, how about .//table[not(ancestor::table)] ? That's quite likely to be inefficient though, unless you're doing it in something like eXist, which has the indexes to support it. – Dominic Mitchell Jan 15 '10 at 09:21
  • Nope. That finds all tables as long as they're not within *any* table. But consider what happens if our context node is already within a table. It'd find *nothing*. Nope, not the answer. – Randal Schwartz Jan 15 '10 at 13:48
0

I don't know how to get the context node to be evaluated in the nested predicates, but what you need is something like this:

descendant::table[not(ancestor::table[ancestor::div])]

only with the ability to reference the context node, instead of div

EDIT: If you set a variable for the context node,

<xsl:variable name="contextNode" select="." />

then you can reference it in the XPATH predicate:

descendant::table[not(ancestor::table[ancestor::*[generate-id(.)=generate-id($contextNode)]])]
Mads Hansen
  • 63,927
  • 12
  • 112
  • 147
0

After investigating it here and elsewhere, the answer seems to be "you can't, and that's why we have XPath 2.0". Oh well.

Randal Schwartz
  • 39,428
  • 4
  • 43
  • 70