Select nodes containing mixed content or just text with XPath

Question

Using XPath 1.0 and XSLT 1.0 I need to select direct parents of mixed content or just text. Consider the following example:

<table class="dont-match">
    <tr class="dont-match">
        <td class="match">Mixed <strong class="maybe-match">content</strong> in here.</td>
        <td class="match">Plain text in here.</td>
        <td class="dont-match"><img src="..." /></td>
    </tr>
</table>
<div class="dont-match">
    <div class="dont-match"><img src="..." /></div>
    <div class="match">Mixed <em class="maybe-match">content</em> in here.</div>
    <p class="match">Plain text in here.</p>
</div>

Obviously the classes match, maybe-match and dont-match are just for demonstrating purpose and are not available for matching. maybe-match means it was better not to match, but I could solve the problem my self, in case it is difficult to exclude these.

Many thanks in advance!

score 2 · Answer 1 · answered Jul 05 '12 at 09:06

2

To get the matches and maybe-matches you could use

 //*[count(text())>=1]

if your xml parser ignores whitespace only text nodes, or otherwise

//*[normalize-space(string(./text())) != ""]

And the maybe matches could be filtered out, by checking if some anchestors match, but then it becomes ugly (whitespace as text node case only):

//*[(normalize-space(string(./text())) != "") and count(./ancestor::*[normalize-space(string(./text())) != ""]) = 0]

answered Jul 05 '12 at 09:06

BeniBela

16,412
4
45
52

The first one selects just everything, probably due to the handling of white-space, but the second and especially third solve my problem. Thanks! – hielsnoppe Jul 05 '12 at 09:16
I just fount that if I have something like `\nfoobar` the `` matches, but not the ``. Based on your suggestions I rewrote yours to `node()[count(./text()[(normalize-space() != '')]) > 0` which works fine for me. – hielsnoppe Jul 05 '12 at 12:07

Dimitre Novatchev · Accepted Answer · 2012-07-05T15:28:37.880

For "match" use:

//*[text()[normalize-space()] and not(../text()[normalize-space()])]

For "maybe-match" use:

//*[../text()[normalize-space()]]

XSLT - based verification:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="/">
     <xsl:copy-of select=
      "//*[text()[normalize-space()] and not(../text()[normalize-space()])]"/>
==========
   <xsl:copy-of select="//*[../text()[normalize-space()]]"/>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the provided XML (wrapped into a single top element to become well-formed XML document):

<t>
<table class="dont-match">
    <tr class="dont-match">
        <td class="match">Mixed <strong class="maybe-match">content</strong> in here.</td>
        <td class="match">Plain text in here.</td>
        <td class="dont-match"><img src="..." /></td>
    </tr>
</table>
<div class="dont-match">
    <div class="dont-match"><img src="..." /></div>
    <div class="match">Mixed <em class="maybe-match">content</em> in here.</div>
    <p class="match">Plain text in here.</p>
</div>
</t>

each of the two XPath expressions is evaluated and the selected nodes are copied to the output:

<td class="match">Mixed <strong class="maybe-match">content</strong> in here.</td>
<td class="match">Plain text in here.</td>
<div class="match">Mixed <em class="maybe-match">content</em> in here.</div>
<p class="match">Plain text in here.</p>
==========
   <strong class="maybe-match">content</strong>
<em class="maybe-match">content</em>

As we can see, both expressions select exactly the wanted elements.

Select nodes containing mixed content or just text with XPath

2 Answers2