XPath to return string concatenation of qualifying child node values

Question

Can anyone please suggest an XPath expression format that returns a string value containing the concatenated values of certain qualifying child nodes of an element, but ignoring others:

<div>
    This text node should be returned.
    <em>And the value of this element.</em>
    And this.
    <p>But this paragraph element should be ignored.</p>
</div>

The returned value should be a single string:

This text node should be returned. And the value of this element. And this.

Is this possible in a single XPath expression?

Thanks.

score 29 · Answer 1 · answered Sep 10 '09 at 13:55

29

In XPath 2.0 :

string-join(/*/node()[not(self::p)], '')

answered Sep 10 '09 at 13:55

Dimitre Novatchev

240,661
26
293
431

But nested function is not supported in string-join() like string-join(normalize-space(//a[@class="title"]//text())) – Learner Oct 13 '15 at 06:41
@SIslam, It is not a "nested function" problem, but just that `normalize-space()` takes a single argument -- not a sequence. You can use this expression instead: `string-join(//a[@class='title']/normalize-space())` . Of course, you must add a second argument to the call of `string-join()` – Dimitre Novatchev Oct 14 '15 at 02:59

Tomalak · Accepted Answer · 2014-10-22T12:08:03.107

20

In XPath 1.0:

You can use

/div//text()[not(parent::p)]

to capture the wanted text nodes. The concatenation itself cannot be done in XPath 1.0, I recommend doing it in the host application.

edited Oct 22 '14 at 12:08

answered Sep 10 '09 at 09:30

Tomalak

332,285
67
532
628

4

Thanks - you're absoluately right. I just read the XPath reference and discovered that all string functions implicitly work on the first node in a node-set, so there's consequently no way to combine selection and concatenation. – Tim Coulter Sep 10 '09 at 10:02
Lovely and elegant. Good on you! – Aaron Feb 04 '12 at 06:49

score 6 · Answer 3 · edited Oct 22 '15 at 21:03

6

This look that works:

Using as context /div/:

text() | em/text()

Or without the use of context:

/div/text() | /div/em/text()

If you want to concat the first two strings, use this:

concat(/div/text(), /div/em/text())

edited Oct 22 '15 at 21:03

kenorb

155,785
88
678
743

answered Sep 10 '09 at 08:13

Guillermo

179
1
9

1

Thanks. This is a good step in the right direction. But I can't see how to concatenate the results. When I wrap this in a call to the string() function, it only returns the value of the first selected node. – Tim Coulter Sep 10 '09 at 08:25
Yes, and, as you could see, my solution does the same as the "correct" solution.. ¬¬ You can concat(...) nodes, but, you wont see the third "text". Try this: concat(/div/text(), /div/em/text()) – Guillermo Sep 10 '09 at 10:17

score 6 · Answer 4 · answered Sep 10 '09 at 08:14

6

/div//text()

double slash forces to extract text regardless of intermediate nodes

answered Sep 10 '09 at 08:14

Dewfy

23,277
13
73
121

This is kind of related and handy to know. Thanks. – Aaron Feb 04 '12 at 06:50

score 0 · Answer 5 · answered Jun 18 '13 at 00:57

0

If you want all children except p, you can try the following...

    string-join(//*[name() != 'p']/text(), "")

which returns...

This text node should be returned.
And the value of this element.
And this.

answered Jun 18 '13 at 00:57

Rodney P. Barbati

1,883
24
18

score 0 · Answer 6 · answered Dec 31 '19 at 08:29

I know this comes a bit late, but I figure my answer could still be relevant. I recently ran into a similar problem. And because I use scrapy in Python 3.6, which does not support xpath 2.0, I could not use the string-join function suggested in several online answers.

I ended up finding a simple workaround (as shown below) which I did not see in any of the stackoverflow answers, that's why I'm sharing it.

temp_selector_list = response.xpath('/div')
string_result = [''.join(x.xpath(".//text()").extract()) for x in temp_selector_list]

Hope this helps!

score -2 · Answer 7 · answered May 21 '13 at 15:00

-2

You could use a for-each loop as well and assemble the values in a variable like this

<xsl:variable name="newstring">
    <xsl:for-each select="/div//text()">
      <xsl:value-of select="."/>
    </xsl:for-each>
  </xsl:variable>

answered May 21 '13 at 15:00

user2406081

1
1

1

Not relevant. Poster asked about XQuery. – Alberto Jul 27 '16 at 20:11

XPath to return string concatenation of qualifying child node values

7 Answers7

Linked