0

If i apply the following xslt

<xsl:stylesheet version="2.0"
          xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="html" indent="yes"/>

    <xsl:template match="*">
        <xsl:copy><xsl:apply-templates/></xsl:copy>
    </xsl:template>

    <xsl:template match="b/*">
        <xsl:copy><xsl:apply-templates/></xsl:copy>
    </xsl:template>

    <xsl:template match="text()">text</xsl:template>

</xsl:stylesheet>

on the following xml

<?xml version="1.0"?>

<a>
   <b></b>
</a>

the output is

<a>
    text
    <b></b>
    text
</a>

What i don't get: All the empty text-nodes between the elements get processed except the empty text-node inside the element b. I don't see any difference on how the child elements of a and b are processed.

Äxel
  • 350
  • 1
  • 17
  • There is no such thing as an empty text node. A text node always has at least one character of data. – michael.hor257k Sep 19 '18 at 19:12
  • @michael.hor257k, that is not true, see https://www.w3.org/TR/xslt-30/#element-text: "An `xsl:text` element may be empty, in which case the result of evaluating the instruction is a text node whose string value is the zero-length string". So at least constructing empty text nodes is possible. – Martin Honnen Sep 19 '18 at 19:29
  • @MartinHonnen The above is a verbatim quote from the XPath 1.0 specification. – michael.hor257k Sep 19 '18 at 19:55

2 Answers2

3

Actually, at least in the XSLT 2.0/3.0 model, a zero-length text node can exist, but only if it is parentless; as soon as you try and attach it to a parent element, it disappears. So if you do:

<xsl:variable name="x" as="node()">
  <xsl:text/>
</xsl:variable>

then count($x) returns 1, $x instance of text() returns true, and string-length($x) returns 0. But when you do

<xsl:variable name="e" as="node()">
   <xsl:copy-of select="$x"/>
</xsl:variable>

then count($x)/child::node() returns 0. This is defined by the rules for Constructing Complex Content (§5.7.1 in XSLT 3.0, rule 6) "Zero-length text nodes within the sequence are removed."

And the XDM data model defines a constraint (§6.7.1 rule 1 in the 3.1 version): "If the parent of a text node is not empty, the Text Node must not contain the zero-length string as its content."

Note that the W3C specs consistently use the word "empty" to refer to a set that has no members, while a string that has no characters is always called "zero-length". In my example above, $x is zero-length but it is not empty.

The situation in XPath 1.0 / XSLT 1.0 is different. Parentless text nodes cannot arise in 1.0, therefore zero-length text nodes cannot ever exist.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
2

There is no empty text node inside the b element, it is an empty element that has no child nodes at all. On the other hand, the a element has three child nodes, the first is a text node with whitespace (at least a line break and some space or tab characters), the second is the b element, the third is a text node with white space (at least a line break).

Also where did you get that result with the indentation of the text output you have shown? At http://xsltransform.hikmatu.com/94hvTyG I get the output <a>text<b></b>text</a>

Martin Honnen
  • 160,499
  • 6
  • 90
  • 110
  • Thats one of the information i did not find at all. I thought that (no space) and (space) are equivalent. Its not really obvious. For example an empty string in java exists, no matter if its empty or not. But thank you. I formatted the output for the better reading. There were no linebreaks. – Äxel Sep 19 '18 at 21:06
  • No, `` and ` ` are not equivalent, unless you also use `xsl:strip-space`. As for the Java comparison, I am not sure I understand it, but if you look at various object models in Java to represent XML trees (e.g. W3C DOM, JDOM, XOM) then I am sure they will have an empty collection of child nodes for the `` element, which of course doesn't mean you cannot, on the other hand, compute the "string value" of that element and get an empty string, e.g. in the W3C DOM Level 3 `getTextContent()` would give you an empty string. – Martin Honnen Sep 20 '18 at 08:07