0

When I try to convert latin1 String to utf8 by Java,something wrong happen. as follows code:

    byte[] latin2 = "¦ñ¨ãÓñ²½ìá".getBytes("ISO-8859-1");
    byte[] latin1 = "¦á¨ãÓñ²½ìá".getBytes("ISO-8859-1");
    byte[] utf8 = new String(latin1, "GB2312").getBytes("GB2312");
    byte[] utf81 = new String(latin2, "GB2312").getBytes("GB2312");
    System.out.println(new String(utf8,"GB2312"));
    System.out.println(new String(utf81,"GB2312"));

The output is

 ?ㄣ玉步灬
 ?ㄣ玉步灬

So,I'm comfused about it,how can i convert latin1 to utf8 exact?

The DB field is:

`name` char(20) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
iameven
  • 338
  • 4
  • 15
  • 1
    What is the source code encoding? It looks quite broken already in those string literals. And what is GB2312 doing there? I thought you want UTF-8? – Thilo Jan 20 '16 at 07:37
  • @Thilo the source code is `latin1` from mysql, and source code is GB2312, so I convert it to gb2312,I try it to utf8 ,but it failed , But,I try the gbk after you comment ,and `︶ㄣ玉步灬`,`︸ㄣ玉步灬`is the result , so,gbk may be right – iameven Jan 20 '16 at 07:42
  • 1
    If your source code encoding is GB2312, your String literals must also use GB2312. Otherwise the compiler will break them. – Thilo Jan 20 '16 at 07:45
  • How is MySQL involved in this? There is no DB access code here. – Thilo Jan 20 '16 at 07:46
  • @Thilo Thank you comment,and sorry for my stupid problem.o(╯□╰)o – iameven Jan 20 '16 at 07:48
  • @Thilo add the db field describe , and the latin1 string are copy from db – iameven Jan 20 '16 at 07:50
  • You cannot just copy latin1 strings into a GB2312 file. Need to match the encoding. – Thilo Jan 20 '16 at 07:54
  • If you're receiving data from a database, then you should be receiving decoded Strings already. You shouldn't have to do any further encoding/decoding until you write it somewhere else. Try writing your data after receiving it from the DB to a text file using a `OutputStreamWriter` configured with UTF-8 encoding. This should prove your DB setup – Alastair McCormack Jan 20 '16 at 09:18

1 Answers1

0

The second parameter in a new String(bytes, charset) call is to set the Charset used for decoding the byte array (From Javadoc: "charset The charset to be used to decode the bytes")... Hence in your case it should be set to the one you used to encode the bytes: "ISO-8859-1":

new String(latin1, "ISO-8859-1").getBytes("GB2312");