CharsetDecoder not working on linux machine

Question

Following does not work on linux machine.

        Charset charset = Charset.forName("UTF-8");
        CharsetDecoder decoder = charset.newDecoder();

        try {
            FileOutputStream fo = new FileOutputStream("hi.txt");
            PrintStream ps = new PrintStream(fo);
            String msgBody = "ÃŽÂ£YMMETOXH";
            ps.println(decoder.decode(ByteBuffer.wrap(decoder.decode(ByteBuffer.wrap(msgBody.getBytes())).toString().getBytes())));
            ps.close();
            fo.close();
        } catch (CharacterCodingException e) {
            e.printStackTrace();
        }

This code works on windows. What can be the issue? On linux machine decoder does not decode the string.

Does it throw some kind of exception? Does it print garbage into the file? Also, your file name is quite Windows-specific, although I'm sure Linux is able to create a file with such a convoluted name in your current directory. — andri, Nov 18 '11 at 13:27
It doesn't throw any exceptions. It creates a file and write same thing as in `msgBody` variable instead of decoded string of `msgBody`. — Harry Joy, Nov 18 '11 at 13:28

score 3 · Accepted Answer · answered Nov 18 '11 at 13:28

3

The problem is that you're using String.getBytes() at least once, possibly twice (your enormously long line is hard to read; using several statements would make it easier to understand). That doesn't specify an encoding, so it'll use the platform default encoding. At that point, you've got a platform dependency... hence the problem.

It's not at all clear what you're trying to achieve, but if you're looking for reasons for platform-specific behaviour, that's the first thing to look at.

Oh, and creating a PrintStream like that will have the same issue... create an OutputStreamWriter with a specific encoding instead.

answered Nov 18 '11 at 13:28

Jon Skeet

1,421,763
867
9,128
9,194

Provided "UTF-8" in getBytes() but still faces the same issue. – Harry Joy Nov 18 '11 at 13:33
@HarryJoy: It's not clear what you mean by "the same issue" as you haven't explained what you're trying to do vs what's actually happening. Note my bit about using `PrintStream` by the way. There's *lots* of encoding and decoding going on in your code - the problem could be in any of them (or multiple). – Jon Skeet Nov 18 '11 at 13:35
I'm trying to decode greek word and save them in a file. I'm getting greek word as `ÃŽÂ£YMMETOXH` in my servlet. When I decode it by above code it remains the same on linux machine but on windows it works fine. I also tried with `OutputStreamWriter`. This also doesn't work. – Harry Joy Nov 18 '11 at 13:41
@HarryJoy: It sounds like your servlet is receiving the text incorrectly to start with - as far as I'm aware, "£" isn't a valid character in a Greek word... I wouldn't be surprised if the problem was that the code on Windows is *coincidentally* making two errors which cancel each other out. – Jon Skeet Nov 18 '11 at 13:43
I'm entering the greek word as `ΣΥΜΜΕΤΟΧΗ` – Harry Joy Nov 18 '11 at 13:45
@HarryJoy: Into the web page? Right. Now work out *exactly* how that's going to be transmitted in the web request, and make sure that you can decode that correctly in the servlet. Once you've *received* it correctly, then you can work out how to save it. It's really important that you tackle one conversion at a time. – Jon Skeet Nov 18 '11 at 13:46
can you tell me which encoding will it take on windows when I do getBytes() without argument. – Harry Joy Nov 18 '11 at 13:53
@HarryJoy: See `Charset.defaultCharset` - but you shouldn't just try to make your Linux box use the same encoding as the Windows box. You should work out *where* things are going wrong. It's not clear to me that you've mapped out each stage in the conversion process, and worked out what encoding should be used at each stage. – Jon Skeet Nov 18 '11 at 13:56

CharsetDecoder not working on linux machine

1 Answers1