0

I have a problem with XML encoding. When i created XML on localhost with cp1251 encoding all cool
But when i deploy my module on server, xml file have incorrect symbols like "ФайлПФР"

 StringWriter writer = new StringWriter();
 StreamResult result = new StreamResult(writer);
 DOMSource source = new DOMSource(doc);

 transformer.setOutputProperty(OutputKeys.ENCODING, "cp1251");
 transformer.setOutputProperty(OutputKeys.INDENT, "yes");
 transformer.transform(source, result);

 String attach = writer.toString();

How i can fix it?

user3114474
  • 75
  • 1
  • 1
  • 9

2 Answers2

3

I tried to read an XML Document which was UTF-8 encoded, and attempted to transform it with a different encoding, which had no effect at all (the existing encoding of the document was used instead of the one I specified with the output property). When creating a new Document in memory (encoding is null), the output property was used correctly.

Looks like when transforming an XML Document, the output property OutputKeys.ENCODING is only used when the org.w3c.dom.Document does not have an encoding yet.

Solution
To change the encoding of a XML Document, don't use the Document as the source, but its root node (the document element) instead.

// use doc.getDocumentElement() instead of doc
DOMSource source = new DOMSource(doc.getDocumentElement());

Works like a charm.

Source document:

<?xml version="1.0" encoding="UTF-8"?>
<foo bla="Grüezi">
    Encoding test äöüÄÖÜ «Test»
</foo>

Output with "cp1251":

<?xml version="1.0" encoding="WINDOWS-1251"?><foo bla="Gr&#252;ezi">
    Encoding test &#228;&#246;&#252;&#196;&#214;&#220; «Test»
</foo>
Peter Walser
  • 15,208
  • 4
  • 51
  • 78
0

A (String)Writer will not be influenced from an output encoding (only from the used input encoding), as Java maintains all text in Unicode. Either write to binary, or output the string as Cp1251.

Note that the encoding should be in the <?xml encoding="Windows-1251"> line. And I guess "Cp1251" is a bit more java specific.

So the error probably lies in the writing of the string; for instance

response.setCharacterEncoding("Windows-1251");
response.write(attach);

Or

attach.getBytes("Windows-1251")
Joop Eggen
  • 107,315
  • 7
  • 83
  • 138