I have small test example like this
public class Main {
public static void main(String[] args) {
String s = "";
System.out.println(s);
System.out.println(s.length());
System.out.println(s.toCharArray().length);
System.out.println(s.getBytes(StandardCharsets.UTF_8).length);
System.out.println(s.getBytes(StandardCharsets.UTF_16).length);
System.out.println(s.codePointCount(0, s.length()));
System.out.println(Character.codePointCount(s, 0, s.length()));
}
}
And result is:
4
4
8
10
2
2
I can not understand, why 1 unicode character Vanuatu flag return 4 of length, 8 bytes in utf-8 and 10 bytes in utf-16, I know java using UTF-16 and it need 1 char(2 byte) for 1 code point but it make me confusing about 4 char for 1 unicode character, i think it just need 2 char but result 4. Someone can fully explain to help me understand about this. Many thanks.