XPath expression: selecting text nodes between element nodes

Question

Based in the following HTML I want to extract TextA, TextC and TextE.

<div id='content'>
    TextA
    <br/>
    <br/>
    <p>TextB</p>
    TextC
    <br/>
    TextC
    <p>TextD</p>
    TextE
</div>

I tried to get TextC like so but I don't get the result I want:

Query:
//*[preceding::p[contains(.,"TextB")] and following::p[contains(.,"TextD")]]
Expected result:
["TextC", <br/>, "TextC"]
Actual result:
[<br/>]

Is there a way to select the text nodes without using indexes like //div/text()[1]?

Your question is very unclear. In the first line you say "I want to extract TextA, TextC, and TextE" but then later you talk about wanting to select `["TextC",
, "TextC"]`. Please clearly explain what it is you're trying to do. — JLRishe, Sep 15 '16 at 17:10
Got it. I do want to extract all of the said text nodes, my query was just an example of how I tried to do it. — Michael Wyss, Sep 20 '16 at 15:49

score 4 · Accepted Answer · edited Sep 15 '16 at 13:01

4

The reason why the two text nodes aren't in the result of your XPath is because * only match elements. To match both element and text node you can use node() instead :

//node()[preceding::p[contains(.,"TextB")] and following::p[contains(.,"TextD")]]

Demo

Or if you want to get the text nodes only i.e excluding <br/>, you can use text() instead of node():

//text()[preceding::p[contains(.,"TextB")] and following::p[contains(.,"TextD")]]

edited Sep 15 '16 at 13:01

LarsH

27,481
8
94
152

answered Sep 15 '16 at 06:40

har07

88,338
12
84
137

1

Fixed a copy-paste mistake in the code. +1. BTW @OP you may get better efficiency if you use `preceding-sibling` and `following-sibling` instead of `preceding` and `following`, assuming that you can be sure the `
` elements you're referring to are on the same level as the text node. You may even want `preceding-sibling::p[1]`, for greater specificity and efficiency, depending on how broadly you're going to apply this technique to different XML inputs.
– LarsH Sep 15 '16 at 13:04
That's just what I've been looking for. Thanks! – Michael Wyss Sep 20 '16 at 15:51

XPath expression: selecting text nodes between element nodes

1 Answers1

Linked