the relationship between Visual Studio "Character Set Configuration" and the encoding scheme?

Question

As Microsoft stated that:

Multibyte character sets, in particular the double-byte character sets (DBCS). Multibyte character sets provide a means to represent the large number of characters in many Asian languages.

DBCS code pages are used for languages such as Japanese and Chinese. In such a code page, some characters have two-byte encodings

So based on above, I have contradicting results: (2 out of 4 all possible cases, and I have three questions under 3 cases out of 4)

So Case 1(Contracditing):

I asumme When I choose Use Multi-Byte Character Set, the following will automatically choose DBCS encoding:

string chineseString = "我是路人";

but instead compiler said:

warning C4566: character represented by universal-character-name '\u6211' cannot be represented in the current code page (1252)

which is contradicting the config itself, because 1252 is only western language encoding. Isn't is supposed to use the MBCS/DBCS here?

Case 2 (Understandble, non-contradicting):

I choose "Use Unicode Character Set"

Now I assume I have to specify an encoding, so I will do like this:

string chineseString = u8"我是路人"

which works and makes sense for me.

Case 3(Contracdicting):

I choose "Use Multi-Byte Character Set": wstring chineseStringW = L"我是路人"

so is now using the encoding DBCS? If so, why string does not pick up DBCS? or just because \u6211 fits in wchar_t?

Case 4:

I choose "Use Unicode Character Set": wstring chineseStringW = L"我是路人"

so is it now the encoding UTF16-LE?

It selects the way you talk to operating system functions, the kind that take a string argument. Not std::string, the OS doesn't know beans about C++. Multi-byte made sense twenty years ago, back when there were still the Windows 9x editions that didn't speak Unicode. It stopped making sense a decade ago, so don't use it. [Read this](https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/) to get ahead. — Hans Passant, Mar 06 '21 at 23:24
@HansPassant May be worth noting that MBCS has seen a resurrection of sorts since Win10 1903 started supporting UTF-8 manifested [ActiveCodePage](https://learn.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page#set-a-process-code-page-to-utf-8). — dxiv, Mar 07 '21 at 00:12
@HansPassant really appreciate the link, it answers one of my questions. But again I do not know why under MBCS (case 1) it does not work. If as you said `selects the way you talk to operating system functions`, I do not think my OS only uses Windows-1252. Altougth my default OS encoding is indeed 1252, but I also have Chinese installed. — Dexter, Mar 07 '21 at 14:43

the relationship between Visual Studio "Character Set Configuration" and the encoding scheme?

0 Answers0