Wrong charset?Java program get the damaged string from args[0]

Question

I ran my java program in Windows 10 (defalt charset:Big5). Then I got the garbage string from args[0].

I found that it is hard to convert damaged string(args[0]) to readable string by using any kind of charset.

IDE(UTF-8)-->JVM(UTF-8 damaged string)-->main(UTF-8 damaged string)

UTF-8-->Big5 设定-->?定

I think the argument had already damaged when the argument was passed to the JVM.

arguments in eclipse Run configurations

设定

VM arguments in eclipse Run configurations

-Dfile.encoding=UTF-8

common in eclipse Run configurations encoding:UTF-8

java source file(CharsetARGTest.java) encoding charset:UTF-8

CharsetARGTest.java

import java.io.PrintStream;
import java.io.UnsupportedEncodingException;
import java.lang.reflect.Field;
import java.nio.charset.Charset;

public class CharsetARGTest {
    public static void main(String[] args) throws UnsupportedEncodingException {
//      arg:
//      IDE(UTF-8)-->???(Big5)-->JVM(damaged string)-->main(damaged string)
        
        System.out.println("default charset:"+Charset.defaultCharset());
        
        String str="设定";
        System.out.println("String(UTF-8):"+str);
        System.out.println("UTF-8 bytes:");
        dump_bytes(str.getBytes("UTF-8"));
        
        System.out.println("GBK bytes to String(GBK-->GBK):"+new String(str.getBytes("GBK"),"GBK"));
        System.out.println("GBK bytes:");
        dump_bytes(str.getBytes("GBK"));
        
        System.out.println("UTF-8 bytes to String(UTF-8-->UTF-8):"+new String(str.getBytes(),"UTF-8"));
        System.out.println("UTF-8 bytes:");
        dump_bytes(str.getBytes());
        
        System.out.println("Big5 bytes to String(Big5-->Big5):"+new String(str.getBytes("Big5"),"Big5"));
        System.out.println("Big5 bytes:");
        dump_bytes(str.getBytes("Big5"));
        
        
        System.out.println("\n");
        
        System.out.println("arg:"+args[0]);
        System.out.println("Big5 bytes:");
        dump_bytes(args[0].getBytes("Big5"));
        System.out.println("UTF-8 bytes:");
        dump_bytes(args[0].getBytes("UTF-8"));
        
        
        
        
    }
    static void dump_bytes(byte[] a) throws UnsupportedEncodingException {
        for(byte c:a) {
            System.out.print(c+" ");
        }
        System.out.print("\n");
    }
    static boolean bytesCompare(byte[] a, byte[] b) {

        if (a.length != b.length)
            return false;

        for (int i = 0; i < a.length; i++) {
            if (a[i] != b[i])
                return false;
        }
        return true;

    }
}

Output

default charset:UTF-8
String(UTF-8):设定
UTF-8 bytes:
-24 -82 -66 -27 -82 -102 
GBK bytes to String(GBK-->GBK):设定
GBK bytes:
-55 -24 -74 -88 
UTF-8 bytes to String(UTF-8-->UTF-8):设定
UTF-8 bytes:
-24 -82 -66 -27 -82 -102 
Big5 bytes to String(Big5-->Big5):?定
Big5 bytes:
63 -87 119 


arg:?定
Big5 bytes:
63 -87 119 
UTF-8 bytes:
63 -27 -82 -102

How you found out the step between IDE and JVM -> `???(Big5)`? — KunLun, Aug 29 '20 at 08:24
Just guessing.I think the argument damaged because argument is convertd UTF-8 to Big5 in step ???(Big5) — , Aug 29 '20 at 08:34
You can see that args[0] in Big5 and str in Big5 is the same. — , Aug 29 '20 at 08:37
That doesn't mean there is `Big5 charset`, there can be more charsets which can't decode your char. Also, just because the transfer between IDE and JVM result with a damaged string, doesn't mean there is an extra step between them. — KunLun, Aug 29 '20 at 11:15
Thank for your description.Maybe I should study more about IDE and JVM. — , Aug 29 '20 at 16:05

Wrong charset?Java program get the damaged string from args[0]

0 Answers0