1

Just look at the code bellow

try {
        String str = "上海上海";
        String gb2312 = new String(str.getBytes("utf-8"), "gb2312");
        String utf8 = new String(gb2312.getBytes("gb2312"), "utf-8");
        System.out.println(str.equals(utf8));
    } catch (UnsupportedEncodingException e) {
        e.printStackTrace();
    }

print false!!!

I run this code both under jdk7 and jdk8 and my code style of IDE is utf8.

Can anyone help me?

查晓明
  • 13
  • 1
  • 3
  • java is unicode 16, and string class doesnt have character code, which means no matter what kind of file you read, a string in java with certain characters will always be unicode – Kalpesh Soni Nov 05 '15 at 02:27
  • 3
    Your code is meaningless. You're taking a UTF-16 String (how Java stores Strings) and encoding it as a UTF-8 byte stream, then decoding that byte stream **as if** it was GB2312 encoded. You end up with garbage! What you might have meant to do, is read a UTF-8 encoded byte stream (e.g. from a file) and outputting a GB2312 encoded byte stream (e.g. another file), but that's not what you're doing. – Andreas Nov 05 '15 at 02:29

2 Answers2

0
        String gb2312 = new String(str.getBytes("utf-8"), "gb2312");

This statement is incorrect because String constructor is supposed to take matching byte array and charset, you are saying bytes are utf-8 but charset is gb2312

Kalpesh Soni
  • 6,879
  • 2
  • 56
  • 59
0

what you are looking for is the encoding/decoding when you output/input.

as @kalpesh said, internally, it is all unicode. if you want to READ a stream in a specific encoding and then WRITE it to a different one, you will have to specify the encoding for the conversion between bytes (in the stream) and strings (in java), and then between strings (in java) to bytes (the output stream) like so:

        InputStream is = new FileInputStream("utf8_encoded_text.txt");
        OutputStream os = new FileOutputStream("gb2312_encoded.txt");

        Reader r = new InputStreamReader(is,"utf-8");
        BufferedReader br = new BufferedReader(r);
        Writer w = new OutputStreamWriter(os, "gb2312");
        BufferedWriter bw = new BufferedWriter(w);

        String s=null;
        while((s=br.readLine())!=null) {
            bw.write(s);
        }
        br.close();
        bw.close();
        os.flush();

of course, you still have to do proper exception handling to make sure everything is properly closed.

rmalchow
  • 2,689
  • 18
  • 31
  • Thanks! Actually what I want to do is to send a request to a third part interface which need parameters with gb2312. I can never have a string with gb2312 right?How can I achieve this with a http request? – 查晓明 Nov 07 '15 at 08:57
  • what i am describing works for any stream ... here, i used a FileInputStream and a FileOutputStream as an example. it should work the same with a SerlvetIn/Outputstream. However, you have to be aware of the HTTP side of things as well. in any http request or response, you should have a header to declare the encoding, because according to the HTTP standard, it would otherwise fall back to iso-8859-1. 那个一般都用不了。所以还是主动加。so you should probably read up on that. – rmalchow Nov 09 '15 at 04:23