0

I have a transformation to HTML (xhtml), declared

<?xml version='1.0' encoding='utf-8'?>
<xsl:stylesheet version="2.0" 

<xsl:output method="xhtml" encoding="UTF-8" omit-xml-declaration="yes" indent="no"/>

I am getting different encoding of the entities on serialization. When I output formatted text using the following code (where $converted-value is: Jul&nbsp;28,&nbsp;2015&nbsp;&nbsp;&nbsp;03:13:15&nbsp;p.m.&nbsp;EDT), some times the &nbsp; is encoded, and other time it is not.

<span style="white-space:pre;">
    Jan&nbsp;05,&nbsp;2016&nbsp;&nbsp;&nbsp;05:00:44&nbsp;p.m.&nbsp;EST
</span>

The difference is seen when executing in Oxygen and a Java program. From Oxygen, the entities are always output as &nbsp;, but in other cases (inconsistently) the output is encoded: &amp;nbsp; as follows.

<span style="white-space:pre;">
    Jan&nbsp;05,&nbsp;2016&nbsp;&nbsp;&nbsp;05:00:44&nbsp;p.m.&nbsp;EST
</span>

or

<span style="white-space:pre;">
    Jul&amp;nbsp;28,&amp;nbsp;2015&amp;nbsp;&amp;nbsp;&amp;nbsp;03:13:15&amp;nbsp;p.m.&amp;nbsp;EDT
</span>

This behavior is inconsistent on the same machine, and on others. What controls this behavior? It seems that disable-output-escaping doesn't always work. How can I code so that the output is predictable?

Thanks!

D Olson
  • 3
  • 2
  • I think there are copy-paste errors in your code samples (the second sample is not well-formed XML). Could you correct them please? – Michael Kay Jan 13 '16 at 15:03

1 Answers1

0

If the input is a sequence of six characters (&, n, b, s, p, ;) (as distinct from the single character denoted by &nbsp;) then it will be serialized as the 6-character string &nbsp; if disable-output-escaping is in force, or as the 10-character string &amp;nbsp; otherwise.

disable-output-escaping is in force if:

(a) you request it in the stylesheet, and

(b) the XSLT processor is performing serialization (rather than writing its output, say, to a DOM or JDOM tree). (That will depend on how the processor is invoked, so it might depend on how oXygen does things); and

(c) the instruction on which the d-o-e attribute appears is writing directly to the serializer. This won't be the case if, for example:

(i) the output of the instruction is captured in a variable

(ii) the instruction appears within a try/catch block

Generally, use of disable-output-escaping is deprecated for these kind of reasons. There is almost always a better way of achieving your desired goal.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • Michael, thanks for your response. In point of clarification, this solution has been in place for a few years now, and after upgrading to the Saxon9ee, we've started seeing differences in the serialization that is driven by Java: transformer.transform(new StreamSource(new ByteArrayInputStream(xmlData.getBytes("UTF-8"))), new StreamResult(strWriter) ); on some systems the   is encoded, and some not. We can't find any differences to account for the different serialization. – D Olson Jan 13 '16 at 18:41
  • If you can produce a repro for the problem, we'll be happy to look at it. – Michael Kay Jan 15 '16 at 16:54