I'm thinking about using UTF-16 in an application. But I have some difficulties in understanding some key concepts. In particular the surrogates and combinig characters.
As I understand the surrogates are used for UTF-16 to allow encoding of codepoints that need more than 16 bits. So if I use surrogates my UTF-16 character needs 32 bits.
Combining characters allow alternative form for compability reasons to older encodings. So for example I can write the character ä
also as a
followed by ¨
.
ä
: U+00E4a
: U+0061◌̈
:U+0308 (Combining diaeresis)
So if I use surrogates together with combinig characters it can happen, that my character needs 2 x 32 bits for encoding. This doesn't happen for my example of course. Since there are no surrogates involved. But could it happen with other characters?