I'd like to know, is there any way to get number of characters (represented by the underlying Unicode code points) that are stored in CFString object in the CoreFoundation framework.
There is available function: CFStringGetLength
, but it does not do what it seems to do.
Example: I am trying to the get length of string containing one character (letter "peep" of Shavian Alphabet) which lies in the second (SMP) Unicode plane.
UInt8 arr[] = {0xf0, 0x90, 0x91, 0x90}; //UTF8
CFStringRef r = CFStringCreateWithBytes(0, arr, sizeof(arr),
kCFStringEncodingUTF8, false);
CFIndex length = CFStringGetLength(r);
Documentation states that it returns:
The number (in terms of UTF-16 code pairs) of characters stored in theString.
As you can see, this sentence is contradictory - number of characters is not always equal to the number of UTF-16 code points. However, the part in braces is more accurate - actual result of function is number of UTF-16 sequences. In my example, result of function is 2 (the length of the sequence required to encode the character in UTF-16), while the function name suggests that result would be 1 (in my opinion).
I'd like to find a way to get number of characters in terms of Unicode code points. Is there any way to do it in CoreFoundation?