-2

I am trying to print a string with the unicode characters "\u65e5\u672c\u8a9e\u6587\u5b57\u5217"

How can I print the same? Java converts the above string into non-readable format, as if it is using its default character conversion for unicode. How can I avoid it from happening?

I am running it on OS X.

Edit1: Please provide a solution without adding backslash

AkD
  • 427
  • 10
  • 19
  • Do you want to print `\u65e5...` or the corresponding unicode characters? – Ingo Jan 21 '13 at 22:22
  • I just want the string mentioned above to print as it is. Java tends to convert it once assigned directly yo a string. – AkD Jan 21 '13 at 22:27
  • 1
    It doesn't "tend to". That's how you specify Unicode literals inside Java source code. – Isaac Jan 21 '13 at 22:38
  • @AkD If you want to print just the string "as it is", what is the problem? Perhaps that it is *not* what you think it is. – Ingo Jan 21 '13 at 22:46
  • 2
    It is not clear what you mean by "a string with the unicode characters." As you can see, many of us are confused as to whether you mean a string whose first character is `(char) 0x65e5` or a string whose first six characters are `{ '\\', 'u', '6', '5', 'e', '5' }`. You say "Java converts the above string into a non-readable format." What is the non-readable format? What are you seeing? – VGR Jan 22 '13 at 02:21
  • @VGR See the double quotes around the string. Assign this in java string object as String vgr = "\u65e5\u672c\u8a9e\u6587\u5b57\u5217" .Now try to do system.out.print . Let me know if you get the same string if yes , let me know your encoding font. Anyways I have tried with utf-8, 16 and cp420 encoding for far without success. I get something like "??? @" etc – AkD Jan 22 '13 at 22:13
  • By "have tried with utf-8, 16, and cp420" are you referring to something in your code? You should not need to specify an encoding anywhere. When I do `System.out.println("\u65e5\u672c\u8a9e\u6587\u5b57\u5217");` I see six CJK characters (on Linux, where my system locale is en_US.utf8). What do you see if you execute `javax.swing.JOptionPane.showMessageDialog(null, "\u65e5\u672c\u8a9e\u6587\u5b57\u5217");`? – VGR Jan 23 '13 at 11:54
  • @VGR Yes I am talking about doing it through code . My eclipse is pointing to a different font on Os X. Let me try changing that to UTF-8. – AkD Jan 23 '13 at 18:44
  • Please show us the line(s) of code where you specify an encoding. – VGR Jan 23 '13 at 22:24
  • Protobuff serializer was encoding by itself in upstream project. Thats the reason I had hard time figuring the issue. Many ppl face this issue. I dont understand why its was voted down , it took me sometime to figure this out since this chars did not make any sense. It was a genuine question. But thanks everyone for helping out. – AkD Sep 25 '13 at 20:37

3 Answers3

3

If you're trying to print exactly that then you need to escape your backslashes:

\\u65e5\\u672c\\u8a9e\\u6587\\u5b57\\u5217

Edit: If this is not ok, or even if it is, check out this answer - escapeJava from Apache commons sounds like it might be what you're looking for. Or maybe one of the escapeHtml methods? I'm not entirely sure if escapeJava will work for unicode.

Community
  • 1
  • 1
Jeff
  • 12,555
  • 5
  • 33
  • 60
  • this answer is wrong. There are no backslashes. The OP has a unicode string and wants to print the encoded version of it. – Dmitry B. Jan 21 '13 at 22:24
  • @DmitryBeransky Does he? That's not how I read the question, and he hasn't responded to Ingo's comment yet - If he says it's not what he wants I'll happily delete the answer. – Jeff Jan 21 '13 at 22:26
  • @Jeff well you have modified the string.Say I have millions of records with such format. Its not feasible to append backslash in front of all the characters. Do we have a way to let java to stop converting unicodes – AkD Jan 21 '13 at 22:31
  • @AkD See my edit - hopefully that will help? Edit: Of course, it's not ideal - it'll add overhead to printing them - but AFAIK there's no built in way of doing what you're asking. – Jeff Jan 21 '13 at 22:40
  • 1
    @AkD - if you have records with this format, than printing them is not a problem. The \uXXXX are only replaced in source code. – Ingo Jan 21 '13 at 22:48
0

escape your backslashes

String s = "\\u65e5\\u672c\\u8a9e\\u6587\\u5b57\\u5217";
fazhool
  • 121
  • 2
0

One could do the following for each character ch in the string:

  int c = ch;
  printf("\\u%04x", c);
Ingo
  • 36,037
  • 5
  • 53
  • 100