2

Is there anyway I can select nodes only contains whitespace (  ,   , 	) using xpath..

here is a example,

<doc>
    <p> </p>
    <p>   </p>
    <p>         </p>
    <p>text</p>
    <p> text</p>
    <p> text</p>
</doc>

I need to select the first 3 <p> nodes which only contains whitespace elements

sanjay
  • 1,020
  • 1
  • 16
  • 38

4 Answers4

2

You need to apply translate and check length of such nodes which only contains spaces.

Demo for you : http://xsltransform.net/ejivdHb/22

So, try following

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ns1="http://locomotive/bypass/docx" >
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="node()">
    <xsl:copy>
        <xsl:apply-templates select="node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="p">
       <!-- Select only node with white spaces -->
       <xsl:if test="string-length(translate(., ' &#9;&#xA;&#xD;','')) = 0">
          <xsl:copy-of select="." />      
       </xsl:if>

</xsl:template>

</xsl:stylesheet>
ScanQR
  • 3,740
  • 1
  • 13
  • 30
2

To select p nodes that are not empty, but contain only whitespace characters, use:

/doc/p[string() and not(normalize-space())]

For example, the following stylesheet:

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:template match="/doc">
    <xsl:copy>
        <xsl:copy-of select="p[string() and not(normalize-space())]"/>
    </xsl:copy>
</xsl:template>

</xsl:stylesheet>

when applied to the following example input:

XML

<doc>
    <p/>
    <p> </p>
    <p>   </p>
    <p>         </p>
    <p>text</p>
    <p> text</p>
    <p> text</p>
</doc>

will return:

Result

<?xml version="1.0" encoding="UTF-8"?>
<doc>
   <p> </p>
   <p>   </p>
   <p>         </p>
</doc>
michael.hor257k
  • 113,275
  • 6
  • 33
  • 51
2

Note that the normal XML definition of whitespace does NOT include the NBSP character (xA0).

To select nodes that contain one or more whitespace characters, and nothing else, where whitespace means x9, xa, xd, x20, and xa0, you can do (in XPath 2.0)

select="//*[matches(., '[&#x9;&#xa;&#xd; &#xa0;]+')]"

Alternatively you might consider

select="//*[matches(., '[\p{Z}]+')]"

which matches many other space-like characters such as em-space, en-space, thin-space, hair-space, ideographic space, etc.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
-1

You can use following xpath to only select nodes that contain white spaces:

//*[normalize-space(text())='']
Mahipal
  • 900
  • 1
  • 5
  • 7