I'm struggling with Unicode characters in Java 10.
I'm using the java.text.BreakIterator package.
For this output:
myString="ab" hex=0061d835dcde0062
myString.length()=4
myString.codePointCount(0,s.length())=3
BreakIterator output:
a hex=0061
hex=d835dcde
b hex=0062
Seems correct.
Using the same Java code, then with this output:
myString="G̲íl" hex=0047033200ed006c
myString.length()=4
myString.codePointCount(0,s.length())=4
BreakIterator output:
G̲ hex=00470332
í hex=00ed
l hex=006c
Seems correct too, EXCEPT for the codePointCount=4.
Why isn't it 3, and is there a means of getting
a 3 value without using BreakIterator?
My goal is to determine if all (output) chars of a string are 16-bit, or are surrogate or combining chars present?