0

How to select certain nodes which have at least one following-sibling node, but have no immediate text node after them using single XPath 1.0 expression?

For instance, from the following XML:

<p>This is some <b>forma</b><b>tted</b> text, this is <b>bold</b>.</p>

I want to extract the first <b> tag.

I have come up with the following expression so far:

//b[following-sibling::*[1][self::b]][not(text() = following-sibling::text()[1]/preceding-sibling::*[1][self::b]/text())]

However, it will not extract tags with identical text, for example:

<p>I am hungry for <b>paw</b><b>paw</b>.</p>

May there be a better and simpler way?

Cuder
  • 163
  • 2
  • 8
  • It seem that your source code is HTML, but not XML, so it would be better if you set correct tags. Also note that XPath with `p` HTML nodes [might behave differently](https://stackoverflow.com/questions/48548296/how-to-find-direct-children-of-element-in-lxml/48550674#48550674) than XPath with XML nodes – Andersson Feb 08 '18 at 11:53
  • No, it's XML. Some part from a bigger XML file. `p` nodes do not matter, actually, as there may be some other XML nodes, like `action` or `foo-par`. `p` here is just an example. An expression should extract `b` from any parent nodes. – Cuder Feb 08 '18 at 12:45
  • this xpath works on both examples `'//b[1]/text()'` although looks over simplistic. – LMC Feb 08 '18 at 14:52
  • @LuisMuñoz, this does not work, because 1) I need to find several `b` nodes, not text inside them, and not only the first node 2) these are just examples, `b` nodes may come in various ways and positions, not necessarily the first one, and the expression I seek should extract `b` tags which exactly satisfies the condition I mentioned at the very beginning (first sentence). – Cuder Feb 08 '18 at 16:11
  • @Andersson: You have an insightful observation regarding `p` elements in HTML at the referenced linked (+1), but the markup in Cuder's case is normal mixed-content and [able to be selected via XPath](https://stackoverflow.com/a/54993857/290085). – kjhughes Mar 05 '19 at 13:33

1 Answers1

2

This XPath,

//*[following-sibling::node()[1][not(self::text())]]

will select all elements that have an immediately following sibling that is not a text node.

kjhughes
  • 106,133
  • 27
  • 181
  • 240