But this will forcefully change encoding to UTF-8, but I need the value same as present in actual XML document.
From the point of view of XML, there is no difference what encoding is used, as long as the proper characters are escaped (which is done for your by the XSLT processor). Every XML processor is required to support UTF-8, UTF-16 and US-ASCII. The latter can be used for instance if your XML must be transferred using old techniques that would otherwise mess with the UTF encoding (some older FTP systems for instance).
That said, in XSLT 2.0 and 3.0 there are ways of doing this dynamically by simply using xsl:result-document
, and a trick by loading the XML as unparsed text:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:f="http://example.com/functions">
<xsl:template match="/">
<xsl:result-document href="output-filename" encoding="{f:get-encoding(.)}">
<!-- your code -->
</xsl:result-document>
</xsl:template>
<xsl:function name="f:get-encoding">
<xsl:param name="node" />
<xsl:variable name="regex">^.*encoding=['"]([a-zA-Z0-9-]+)["'].*$</xsl:variable>
<xsl:value-of select="replace(tokenize(unparsed-text($node/base-uri()), '\n')[1], $regex, '$1')"/>
</xsl:function>
</xsl:stylesheet>
Or even on xsl:output
for XSLT 3.0 using
In short, just a few lines of code that show quite a few new concepts of XSLT, XPath and XDM:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="3.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:param name="input-url" static="yes" select="'yourinput.xml'" />
<xsl:variable name="get-encoding" static="yes" select='
let $regex := "^.*encoding=['""]([a-zA-Z0-9-]+)['""].*$"
return function($n) {
replace(tokenize(unparsed-text($n), "\n")[1], $regex, "$1")
}' />
<!-- a shadow attribute is replaced with the actual attribute by the same name -->
<xsl:output _encoding="{$get-encoding($input-url)}" />
<xsl:template match="/">
<!-- your code here -->
<result />
</xsl:template>
</xsl:stylesheet>
This code runs correctly with Exselt, but my version of Saxon did not (yet) support it (it does not allow the use of unparsed-text
in a static expression), but I'm sure that'll come soon, or is something that is somehow configurable. I didn't test other XSLT processors.