2

1) Is high and low surrogate char order within String is fixed? Can I rely on it? Experimentally on Windows highSurrogate goes first into String (at lower index in terms of String.charAt(int index)). Is this always so on any Platform (Linux, etc)? Is this documented anywere?

    int[] codePoint = { 0x1F71D };
    String s = new String(codePoint, 0, 1);
    System.out.println(s.length()); // 2
    System.out.println(s); // 

    System.out.println((int) Character.highSurrogate(codePoint[0]));
    System.out.println((int) Character.lowSurrogate(codePoint[0]));

    System.out.println((int) s.charAt(0)); // highSurrogate
    System.out.println((int) s.charAt(1)); // lowSurrogate

2) Besides, I am a bit confused: is there any corellation between high/low surrogate codeunit order and endianness? I guess there is no corellation whatsoever, these two notions are orthogonal?

1 Answers1

0

UTF-8 mandates that the surrogate indicator precede the second char, so that's how Java does it. Endianness is a byte order, not a char order. The JVM spec mandates endianness for the class-file format. Endianness at runtime is specified by the underlying physical platform. Some search engine time will grant you the details. http://www.unicode.org/ https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html

Lew Bloch
  • 3,364
  • 1
  • 16
  • 10