2
<table>
   <tr>
      <td>cell 1</td>
   </tr>
   <tr>
      <td><b>cell 2</b></td>
   </tr>
   <tr>
      <td>
         <table>
            <tr>
               <td><span>cell 3</span></td>
            </tr>
         </table>
      </td>
   </tr>
</table>

Can I use XPath to get <td>cell 1</td>, <td><b>cell 2</b></td> and <td><span>cell 3</span></td>, but not the outer <td><table>... (because it has a nested td inside)?

Note the inner table here is just an example. I want the deepest td elements, meaning they cannot have another td as a descendant.

XPath 1.0 is preferred so I can use lxml.

This is a similar question, but here I know I want td elements.

sourcream
  • 210
  • 1
  • 11

2 Answers2

5

//td will return every td in the document.

//td[not(.//td)] will return every td that does not contain (as one of its descendants) a td element.

Siebe Jongebloed
  • 3,906
  • 2
  • 14
  • 19
Conal Tuohy
  • 2,561
  • 1
  • 8
  • 15
  • 1
    You probably meant `//`. – sourcream Jul 09 '22 at 13:55
  • @sourcream in the expression `//td[not(.//td)]` the path `.//td` in the predicate starts with a `.` which refers to the "context node" which in this case is a `td` identified by the `//td` at the start of the path. Equally, I could have written `//td[not(descendant::td)]` and probably that would have been clearer. – Conal Tuohy Jul 10 '22 at 01:20
0

Since a td must be a part of a table this XPath should be the fastest.

//td[not(descendant::table)]
Siebe Jongebloed
  • 3,906
  • 2
  • 14
  • 19