I'm encountering issue with emojis when trying to generate html output using xsl transformation under certain circumstances.
For instance, I've tested following xsl with different transformation engines:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" encoding="UTF-8"/>
<xsl:template match="/">
<xsl:text disable-output-escaping="yes"><!doctype html></xsl:text>
<html>
<head>
<meta charset="UTF-8"/>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
</head>
<body>
<textarea></textarea><br/>
<input type="text" value=""/>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
I tested with exact same code (based on JAXP definition) for all transformers. I only changed the transformer instance class reference.
Saxon gives correct result:
Java internal repackaged transformer based on xalan (aka com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl) is correct when emoji is put as text in textarea body, but generates wrong result for <input>
field: it seems that emoji is wrong encoded when put in value
attribute:
Xalan 2.7.2 gives even worse result:
For different reasons (mainly license one), I would prefer using Xalan transformer. Any idea how I can make xalan manage emoji correctly ?
EDIT
The transformation is performed with following code:
TransformerFactory factory = TransformerFactory.newInstance(
"com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl",
null);
Transformer transformer = factory.newTransformer(new StreamSource(xsl));
DocumentSource domSource = new DocumentSource(doc);
OutputStream stream = response.getOutputStream();
transformer.transform(domSource, new StreamResult(stream));
stream.flush();
stream.close();
where doc
is a dom4j document, xsl
is the inputstream containing above stylesheet and response
is a HttpServletResponse object which will receive the transformation result.
` – morbac Oct 28 '22 at 09:33