0

Hi I have a scenario where I need to convert the default Charset should be overridden bu UTF-8. I am using below class. But I am not getting the expected output. Because I use a unix system that has default UTF-8 as charset and I compare the results there. Am I wrong somewhere in this program?

public class CharsetDisplay {

 public static void main(String[] args) {
  System.out.println(Charset.defaultCharset().name());
  System.out.println(Charset.isSupported("UTF-8"));
  final Charset UTF8_CHARSET = Charset.forName("UTF-8");
  try {
   byte[] byteArray = new byte[] {34,34,0};
   String str = new String(byteArray,UTF8_CHARSET);
   System.out.println("String*** "+str);
   System.out.println("String to Hex *** "+stringToHex(str));
  } catch (Exception e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
  }
 }

}

Prints output as

windows-1252
true
String*** ""

Note after "" in the string output I have a spl char, which I don't get in a unix env

leppie
  • 115,091
  • 17
  • 196
  • 297
javanerd
  • 2,802
  • 4
  • 24
  • 32
  • Oops sorry for the format lost in the code snippet above.. Stackoverflow just removed all the new lines :( – javanerd Nov 05 '10 at 12:01
  • 2
    What are you trying to do exactly? Bytes `34 34 00` in UTF-8 mean `double-quote double-quote null`, which is what you got. – Nicolas Repiquet Nov 05 '10 at 12:09
  • Since I was seeing the output in eclipse, I was getting a special character after double quote. Now I tried through command prompt in dos and it worked fine. Thanks – javanerd Nov 05 '10 at 13:01

2 Answers2

2

What do you expect the zero byte to render as in this environment? Your output looks exactly correct to me.

Don't forget that any differences that you encounter between environments might not be down to Java. If you're invoking your Java program from a console (which I expect you are), it's up to the console to actually convert the program's output to what you see on the screen. So depending on the charset the console is using, it's entirely possible for Java to output the characters that you expect, but for the console to fail to render them properly.

Andrzej Doyle
  • 102,507
  • 33
  • 189
  • 228
  • Thanks for the reply. I was using eclipse and was seeing a special character. The same program run on a command prompt gives no special char. So I can safely assume that the output is based on the charset of the console. Thanks a ton :) – javanerd Nov 05 '10 at 12:59
2

If Java doesn't pick up your locale's encoding properly you may have to tell it explicitly, at the command-line:

java -Dfile.encoding=utf-8 CharsetDisplay
bobince
  • 528,062
  • 107
  • 651
  • 834