Suppose I have the following simplified nested HTML list:
<ol>
<li>Item 1</li>
<li>Item 2
<ul>
<li>Item 2 1</li>
</ul>
</li>
<li>Item 3</li>
</ol>
and now I’d like to visit every text node while iterating of the list items:
for li in xml.xpath(".//li"):
for t in li.xpath(".//text()"):
print(t)
However, this prints Item 2 1
twice because that text node is the descendant of two li
nodes. So I want to select only those text nodes whose ancestor li
is the current/context list item, so to avoid multi-selecting text nodes in nested list items. Something like
li.xpath(".//text[ancestor::li[1] == .]")
but that’s an invalid expression.
How do I do that? (This is using lxml which builds on libxml2 which implements XPath 1.0).
…