1

Consider the following HTML:

<div>
  text1
</div>
<div>
  <span>
    text2
  </span>
</div>
<div>
  text3
</div>

I need to select all the nodes with text1/text2/text3. When I use

/html/body/div[position() > 0]

I obviously don't get the span around text2, but the div around <span>text2</span>. How can I say: If there is a span following the div, then return the span; if the div is already the last element in a path, return the div? So the intended nodes would be:

div[0]
div[1]/span
div[2]

Update: This one works, but is there a shorter way to do it? (e.g. I am writing /html/body/divin both of them, is it possible to make the pipe symbol (or) at a later place?)

/html/body/div[position() > 0 and count(*) = 0] | /html/body/div[position() > 0]/span
stefan.at.kotlin
  • 15,347
  • 38
  • 147
  • 270
  • 1
    You may take a look at this post: https://stackoverflow.com/questions/14631590/get-text-content-of-an-html-element-using-xpath . Using /html/body/div/text() should result what you need. But I may misunderstand what you want. – Gaël Barbin Apr 10 '21 at 14:30
  • 2
    What version of XPath? In XPath 2.0+, you can do `/html/body/div/(span | self::div[not(span)])`, but XPath 1.0 doesn't support that syntax, so you're either stuck with `/html/body/div[not(span)] | /html/body/div/span` or first select all `/html/body/div` and then select `span | self::div[not(span)]` from there. – JLRishe Apr 10 '21 at 14:35
  • 1
    @Gaël Thanks, it worked with `/html/body/div//text()` /note the two `//`before `text()`) :-) Please consider posting as answer. – stefan.at.kotlin Apr 10 '21 at 14:41
  • @JLRishe Thanks, I had to look up the version, but as I am in a browser context (Chrome) it's 1.0 as I learned. But yes, the one for version 2 would have been what I expected. Unfortunately I am limited to 1.0 ): You could also post as an answer for those able to use 2.0 – stefan.at.kotlin Apr 10 '21 at 14:42

1 Answers1

0

I order to select a node with text content in it, you can use the text() selector.

So if you want select all nodes with some text content form a root node, you can use this xpath selector:

//ROOT_NODE//text()

So, for your example and as you said in your comment:

/html/body/div//text()

Gaël Barbin
  • 3,769
  • 3
  • 25
  • 52