4
<?xml version="1.0" encoding="UTF-16"?>
    <note>
        <from>Jani</from>
        <to>ALOK</to>
        <message>AshuTosh</message>
    </note>

I have the XML parser which supports UTF-8 encoding only else it gives SAX parser exception. How can i convert the UTF-16 to UTF-8?

Alok Chaudhary
  • 3,481
  • 1
  • 16
  • 19
  • There is a 100% chance that your parser supports UTF-18. Give us the name of the parser, the version and the error message to help. – Aaron Digulla Feb 23 '12 at 13:02
  • @AaronDigulla thanks for showing the interest i have got the solution of the problem by the answer provided by Jörn Horstmann .........anyways the name of the parser is com.sun.xml.fastinfoset.dom.DOMDocumentParser – Alok Chaudhary Feb 23 '12 at 13:59
  • That parser definitely supports UTF-16. Make 100% sure that your documents are proper UTF-16 and that you use the correct APIs. – Aaron Digulla Feb 23 '12 at 15:27
  • @Maksud_Tiger: Please give back to the community and *accept* the answer. – home Feb 23 '12 at 17:47

1 Answers1

5

In that case its not a XML parser that your are using, see section 2.2 of the xml specification:

All XML processors MUST accept the UTF-8 and UTF-16 encodings of Unicode

Java xml parsers usually receive their input wrapped in an InputSource object. This can be constructed with a Reader parameter that does the character decoding for the given charset.

InputStream in = ...
InputSource is = new InputSource(new InputStreamReader(in, "utf-16"));

For the "utf-16" charset the stream should start with a byte order mark, if that is not the case use either "utf-16le" or "utf-16be".

Jörn Horstmann
  • 33,639
  • 11
  • 75
  • 118
  • :-/ The parser should read the XML header (which contains the encoding) and use the rules mentioned above to process the document correctly. You should never define an encoding yourself when reading XML because that will break if someone sends you something using a different encoding. – Aaron Digulla Feb 23 '12 at 15:29