As we know, UTF-16 is variable-length when there is a character over U+10000.
However, .Net, Java and Windows WCHAR
UTF-16 string is treated as if they are fixed-length... What happens if I use over U+10000?
And if they process over U+10000, how do they process? For example, in .Net and Java char
is 16bit. so one char
cannot process over U+10000..
(.net, java and windows is just example.. I'm talking about how to process over U+10000. But I think I'd rather know how they process over U+10000, for my understanding)
thanks to @dystroy, I know how they process. But there is one problem: If string use UTF-16 surrogate, a random access operation, such as str[3]
, is O(N) algorithm because any character can be 4-byte or 2-byte! How is this problem treated?