how to tell if a wchar_t has a surrogate (UTF-16)?

Question

I've seen a few other posts on this issue but was unable to find any details on how to determine programatically if a codepoint uses more than one 2-byte (on Windows) wchar_t.

An example:

const wchar_t* s2 = L"\U0002008A"; // The "Han" character
std::wstring in(s2);               // length() == 2

I'd like to know how to determine when a character will have a length() > 1.

Just check for the proper ranges according to the UTF-16 encoding (easy to google). You most likely won't find anything more sophisticated. — Šimon Tóth, Apr 18 '13 at 16:39

score 5 · Answer 1 · answered Apr 18 '13 at 16:42

5

Any codepoint above U+FFFF uses surrogates in its UTF-16 encoding. Surrogate values are in the range D800-DFFF.

answered Apr 18 '13 at 16:42

bames53

86,085
15
179
244

how to tell if a wchar_t has a surrogate (UTF-16)?

1 Answers1