0

I am facing trouble in translating portuguese characters that is part of a field in XML String. I am using the transform Method and also encoded with iso-8859-1 and I ended up with the following error:

javax.xml.transform.TransformerException: com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: Invalid byte 2 of 3-byte UTF-8 sequence.

Here is the code i am using.

String transformedMessage = "";

ByteArrayInputStream bais = null;
try {
    bais = new ByteArrayInputStream(
            inputMessage.getBytes("iso-8859-1"));
} catch (UnsupportedEncodingException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}
StreamSource xsltSource = new StreamSource(
        new ByteArrayInputStream(xsltTemplate.getBytes()));

StreamSource source = new StreamSource(bais,"iso-8859-1");
ByteArrayOutputStream baos = new ByteArrayOutputStream();
StreamResult result = new StreamResult(baos);

TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer(xsltSource);

transformer.transform(source, result);
transformedMessage = baos.toString();

return transformedMessage;

inputMessage has the Name tag with the name "Olá" (Related Hex decimals : 4f 6c e1), whcih is in portuguese.

The same code is working if we send Chinese and Thai Characters. Can you please help me with this error?

Heres the sample XMl that i am using.

<?xml version="1.0" encoding="UTF-8"?><TransactionProcessor> <Request> <MessageData> <MessageType>Authorization</MessageType> <IPAddress>187.150.23.80</IPAddress> <IssueDate>20150715</IssueDate> <TravelAgencyName>Sindicato Olá</TravelAgencyName> <TravelDate>20150716</TravelDate> <IssuingCarrierCode>IJ</IssuingCarrierCode> </TransactionData> </Request> </TransactionProcessor>
BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
  • It sounds like your XML actually declares that it's in UTF-8, not ISO-8859-1. Can you provide a short but complete example file that fails? (If it works with Chinese characters, it's definitely not in ISO-8859-1.) Where does your source data come from? Is it possible that it's just not been properly encoded? – Jon Skeet Jul 21 '15 at 06:11
  • Yes, XML file declares it in UTF-8 and not ISO-8859-1. Even with UTF-8, i ended up in the same exception. So i used ISO Format to Change it. Here is the sample xml data. Authorization 187.150.23.80 20150715 Sindicato Olá 20150716 IJ – sunil gogula Jul 21 '15 at 06:14
  • "So i used ISO Format to Change it". Don't do that. Trying to "change" the encoding of data because you don't understand an error is almost *always* the wrong approach. It sounds to me like whatever's producing the file in the first place is broken. Please show the relevant section of the file as text *and* binary (e.g. in hex) in your question. – Jon Skeet Jul 21 '15 at 06:15
  • This is the place where it is failing. transformer.transform(source, result); and the sample XML that i am passing Authorization 187.150.23.80 20150715 Sindicato Olá 20150716 IJ – sunil gogula Jul 21 '15 at 06:19
  • The XSLT we are using to Transform the message has the following Header. – sunil gogula Jul 21 '15 at 06:23
  • 1
    Forget the XSLT for the moment. Can you even read the XML file, without transforming it? All of this information should be in the *question*, by the way - not comments. (And we need to see the binary representation of the TravelAgencyName element. I suspect it's invalid.) – Jon Skeet Jul 21 '15 at 06:24
  • We can read and transform the XML properly for all other input until we pass portuguese character like Olá (Binary Equivalent 01001111011011001100001110100001) – sunil gogula Jul 21 '15 at 06:34
  • As I said, please put it in the *question* - and preferrably as hex or at *least* separate bytes, instead of one long binary string... you're making it very hard for anyone to help you. – Jon Skeet Jul 21 '15 at 07:12
  • Relative Hex equivalent to "Olá" is 4f 6c e1. Hope this helps and is this the information that you are looking for? – sunil gogula Jul 21 '15 at 08:13
  • `xsltTemplate.getBytes()` uses the default encoding, use `xsltTemplate.getBytes("UTF-8")` or "ISO-8859-1". Also "Windows-1252" might be better than "ISO-8859-1" when using `€ “ — • ›` and others. – Joop Eggen Jul 21 '15 at 08:26
  • Well that's not UTF-8, therefore your file is broken, as I suspected. You should find out what's generating it, and fix it. It's still not clear to me why you've ignored my repeated requests for this information to be *in the question* though. – Jon Skeet Jul 21 '15 at 08:43
  • I have changed the code to reflect "Windows-1252", but no luck :( – sunil gogula Jul 21 '15 at 08:46
  • I have posted the Relevant Hex decimal format in my Question :) . But i am using the above code that i posted in Question. And the methods are predefined in Java. So where might it have gone wrong? – sunil gogula Jul 21 '15 at 08:48
  • That binary doesn't match the hex. Curiously enough, that binary IS UTF8- the binary representation is on this page (http://www.fileformat.info/info/unicode/char/e1/index.htm), and the last 16 digits of that binary match the UTF-8 encoding of aacute; perfectly. – Flynn1179 Jul 21 '15 at 12:11
  • Thanks. If it matches the representation of UTF-8, then when i send the Name as "Olá", i ended up in exception. Is there a way to get rid of that Error. – sunil gogula Jul 22 '15 at 05:10

0 Answers0