6

In Delphi XE2, I'm doing a xslt transform on a received XML file to remove all namespace information.
Problem: It changes

<?xml version="1.0" encoding="utf-8"?>

into

<?xml version="1.0" encoding="utf-16"?>

This is the XML that I get back from Exchange server:

<?xml version="1.0" encoding="utf-8"?>
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Header>
<h:ServerVersionInfo MajorVersion="14" MinorVersion="0" MajorBuildNumber="722" MinorBuildNumber="0" Version="Exchange2010" xmlns:h="http://schemas.microsoft.com/exchange/services/2006/types" xmlns="http://schemas.microsoft.com/exchange/services/2006/types" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>
</s:Header>
<s:Body xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<m:ResolveNamesResponse xmlns:m="http://schemas.microsoft.com/exchange/services/2006/messages" xmlns:t="http://schemas.microsoft.com/exchange/services/2006/types">
<m:ResponseMessages>
<m:ResolveNamesResponseMessage ResponseClass="Success">
<m:ResponseCode>NoError</m:ResponseCode>
<m:ResolutionSet TotalItemsInView="1" IncludesLastItemInRange="true">
<t:Resolution>
<t:Mailbox>
<t:Name>developer</t:Name>
<t:EmailAddress>developer@timetellbv.nl</t:EmailAddress>
<t:RoutingType>SMTP</t:RoutingType>
<t:MailboxType>Mailbox</t:MailboxType>
</t:Mailbox>
<t:Contact>
<t:Culture>nl-NL</t:Culture>
<t:DisplayName>developer</t:DisplayName>
<t:GivenName>developer</t:GivenName>
<t:EmailAddresses>
<t:Entry Key="EmailAddress1">SMTP:developer@timetellbv.nl</t:Entry>
</t:EmailAddresses>
<t:ContactSource>ActiveDirectory</t:ContactSource>
</t:Contact>
</t:Resolution>
</m:ResolutionSet>
</m:ResolveNamesResponseMessage>
</m:ResponseMessages>
</m:ResolveNamesResponse>
</s:Body>
</s:Envelope>

This is the function that removes the namespace info:

Uses
   MSXML2_TLB; // IXMLDOMdocument

class function TXMLHelper.RemoveNameSpaces(XMLString: String): String;
const
  // An XSLT script for removing the namespaces from any document.
  // From http://wiki.tei-c.org/index.php/Remove-Namespaces.xsl
  cRemoveNSTransform =
    '<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">' +
    '<xsl:output method="xml" indent="no"/>' +

    '<xsl:template match="/|comment()|processing-instruction()">' +
    '    <xsl:copy>' +
    '      <xsl:apply-templates/>' +
    '    </xsl:copy>' +
    '</xsl:template>' +

    '<xsl:template match="*">' +
    '    <xsl:element name="{local-name()}">' +
    '      <xsl:apply-templates select="@*|node()"/>' +
    '    </xsl:element>' +
    '</xsl:template>' +

    '<xsl:template match="@*">' +
    '    <xsl:attribute name="{local-name()}">' +
    '      <xsl:value-of select="."/>' +
    '    </xsl:attribute>' +
    '</xsl:template>' +

    '</xsl:stylesheet>';

var
  Doc, XSL: IXMLDOMdocument2;
begin
  Doc := ComsDOMDocument.Create;
  Doc.ASync := false;
  XSL := ComsDOMDocument.Create;
  XSL.ASync := false;
  try
     Doc.loadXML(XMLString);
     XSL.loadXML(cRemoveNSTransform);
     Result := Doc.TransFormNode(XSL);
  except
     on E:Exception do Result := E.Message;
  end;
end; { RemoveNameSpaces }

But after this, it's suddenly a utf-16 document:

<?xml version="1.0" encoding="UTF-16"?>
<Envelope>
[snip]
</Envelope>

After Googling "xsl utf-8 utf-16" I tried several things:

  • Change the line (e.g. Output DataTable XML in UTF8 rather than UTF16)

    <xsl:output method="xml" indent="no">
    

    into either:

    <xsl:output method="xml" encoding="utf-8" indent="no"/>
    <xsl:output method="xml" encoding="utf-8"/>
    <xsl:output encoding="utf-8"/>
    

    That did not work.
    (It would be the optimal solution, according to http://www.xml.com/pub/a/2002/09/04/xslt.html "The encoding attribute actually does more than add an encoding declaration to the result document; it tells the XSLT processor to write out the result using that encoding.")

  • Change the line (e.g. XslCompiledTransform uses UTF-16 encoding)

    <xsl:output method="xml" indent="no"/>
    

    into

    <xsl:output method="xml" omit-xml-declaration="yes" indent="no" />
    

    which leaves out the starting xml tag, but if I then just prepend

    <?xml version="1.0" encoding="utf-8"?>
    

    I will lose characters because no actual utf conversion is done.

  • IXMLDOMdocument2 does not have an Encoding property

Any ideas how to fix this?

Remarks/background:

  • If all else fails there's maybe still the option to change the utf-16 XML data to utf-8, but that's an entirely different approach.

  • I'm trying to do everything utf-8 because I'm communicating with Exchange server through EWS, and setting the http request header to utf-16 does not work: Exchange tells me that the content-type 'text/xml; charset = utf-16' is not the expected type 'text/xml; charset = utf-8'. EWS returns utf-8 (see start of post).

Community
  • 1
  • 1
Jan Doggen
  • 8,799
  • 13
  • 70
  • 144

2 Answers2

2

The problem is the use of the transformNode method, it returns a string and with MSXML such a string is UTF-16 encoded. So you need to create an empty MSXML DOM document for the result and use the transformNodeToObject method, passing the empty DOM document as the second argument, then you can save the result document to a file or stream and the encoding should be as specified in the xsl:output directive.

Martin Honnen
  • 160,499
  • 6
  • 90
  • 110
  • I'd say that DOM is internaly implemented using UTF-16 hence the result of transformation in target DOM document will be encoded in UTF-16 as well. Encoding per se should be a task for input/output filter, so I'd expected it is neccessary to call e.g. `iXMLDocument.SaveToXML(AUTF8String)` – pf1957 Apr 18 '13 at 09:34
  • MSXML does not have a method named `SaveToXML`. It has a method named `save` on DOM documents and my suggestion is to use that method on a DOM document that was created empty and then passed in to the `transformNodeToObject` method. That way, if you save to a file or stream, the encoding should be as intended. That is not possible if you use `transformNode`. – Martin Honnen Apr 18 '13 at 10:04
  • I know. I used to do not call MSXML directly but via `IXMLDocument`/`IXMLNode`. There are overloaded methods `TransformNode` and one of them calls `transformNodeToObject`. My comment has concerned neccessity to perform some kind of **save** operation to ensure proper encoding. It can be saved easily e.g. by calling SaveToXML and passing var argument of type UTF8String. – pf1957 Apr 18 '13 at 10:15
  • I prefer the IXMLDocument approach, (OLEVariants tell nothing about types and methods, no code completion possible etc). I'm trying var Doc, XSL, Res: IXMLDocument; TopNode : IXMlNode; Doc := TXMLDocument.Create(nil); Doc.Active := true; XSL := TXMLDocument.Create(nil); XSL.Active := true; Res := TXMLDocument.Create(nil); Res.Active := true; TopNode := Doc.documentElement; Doc.XML.Text := XMLString; XSL.XML.Text := cRemoveNSTransform; TopNode.TransformNode(XSL.DocumentElement,Res); Result := Res.XML.Text; but get 'No active document' on TransformNode. What's wrong? – Jan Doggen Apr 18 '13 at 13:42
  • I am not sure what you want to achieve by assigning to ` Doc.XML.Text` or `XSL.XML.Text`. Use `loadXML(string)` to parse some string with XML/XSLT markup into a DOM document. And what exactly is it you want to achieve, what kind of result are you looking for? If you need a file with the transformation result I strongly suggest you let MSXML create that by calling `Res.Save("result.xml")`. If you don't want a file with the transformation result but merely a string them I am not sure the attempt with `transformNodeToObject` helps, the `xml` property of a DOM document is too an UTF-16 enc. string – Martin Honnen Apr 18 '13 at 14:28
1

To use IXMLDocument in you original code, it should look like this:

var
  iInp, iOtp, iXsl: IXMLDocument;
  Utf8: UTF8String;
begin
  iInp := LoadXMLData(XMLString);
  iXsl := LoadXMLData(cRemoveNSTransfrom);
  iOtp := NewXMLDocument;
  iInp.Node.TransformNode(iXsl.Node,iOtp);
  iOtp.SaveToXML(Utf8);
end

Now the variable Utf8 should contain transformed XML in UTF-8 encoding, If you want save to stream/file, replace SaveToXML by

  iOtp.Encoding := 'UTF-8';
  iOtp.SaveToFile(....);
pf1957
  • 997
  • 1
  • 5
  • 20