VS2013 and Unicode literals give warnings

Question

What is wrong with this code:

static const std::vector<wchar_t> glyphs(
    {L'A', L'B', L'C', L'D', L'E', L'F', L'G', L'H',
     L'I', L'J', L'K', L'L', L'M', L'N', L'O', L'P',
     L'Q', L'R', L'S', L'T', L'U', L'V', L'W', L'X',
     L'Y', L'Z', L' ', L'Ä', L'Ö', L'Ü', L'Å', L' ',
     L'a', L'b', L'c', L'd', L'e', L'f', L'g', L'h',
     L'i', L'j', L'k', L'l', L'm', L'n', L'o', L'p',
     L'q', L'r', L's', L't', L'u', L'v', L'w', L'x',
     L'y', L'z', L' ', L'ä', L'ö', L'ü', L'å', L'\"',
     L'0', L'1', L'2', L'3', L'4', L'5', L'6', L'7',
     L'8', L'9', L'!', L'\"',L'#', L'$', L'%', L'&',
     L' ', L'(', L')', L'*', L'+', L',', L'-', L'.',
     L'/', L':', L';', L'<', L'=', L'>', L'?', L' ',
     L'Â', L'À', L'É', L'È', L'Ê', L'Ë', L'Î', L'Ï',
     L'Ô', L'Û', L'Ç', L'â', L'à', L'é', L'è', L'ê',
     L'ë', L'î', L'ï', L'ô', L'û', L'ç', L'\'',L' '});

Exactly the same piece of code compiles without warnings and works with GCC and Clang, but with VS2013 I get:

warning: C4066: characters beyond first in wide-character constant ignored

..for lines that start with glyphs 'y', 'Y', 'Â', 'Ô' and 'ë'.

Some of your literals are using Unicode characters that can be represented with **combining characters**, thus making them *multi-character literals*. For example, `Ä` can be represented as either a single Unicode codepoint `U+00C4` or as a sequence of Unicode codepoints `U+0041 U+0308`. I think the compiler is warning you that some of the characters in your array are using multi-character literals that are having some of their extra characters ignored. Make sure your literals are using Unicode Normalization NFC or NFKC so those combining characters are not being used. — Remy Lebeau, Dec 30 '14 at 21:28
@RemyLebeau this post does not contain any combining characters. If it is copied and pasted from the original source, it is highly unlikely that there are combining characters there. — n. m. could be an AI, Dec 30 '14 at 21:39
Do not use such literals, they are not portable. Read your Unicode characters from a file. — n. m. could be an AI, Dec 30 '14 at 21:40
@n.m.: just because there are no combining characters in THIS post does not mean they do not exist in the original source code. Depending on editors and browsers involved, the Unicode data could have gotten normalized during the copy/pasting process. — Remy Lebeau, Dec 30 '14 at 21:48
@RemyLebeau "the Unicode data could have gotten normalized during the copy/pasting process" Theoretically it could, but this is rather unlikely. Not many programs perform Unicode normalization out of the blue. — n. m. could be an AI, Dec 30 '14 at 21:50

score 9 · Accepted Answer · answered Dec 30 '14 at 22:01

9

There's only one way I can repro this, saving the text to a file encoded in utf8 without a BOM. The compiler will guess at the system default codepage and trip over the double bytes in the utf8 codes produced by the accented characters.

In VS, use File + Save As, click on the arrow on the Save button and select "Save with Encoding". Pick "Unicode (UTF-8 with signature) - Codepage 65001" from the list.

answered Dec 30 '14 at 22:01

Hans Passant

922,412
146
1,693
2,536

Or save as UTF-16LE with BOM. The BOM is the key. – Mordachai Dec 30 '14 at 22:04
1

Some research here: https://alfps.wordpress.com/2011/11/22/unicode-part-1-windows-console-io-approaches/. It seams VC has ANSI hardcoded (unless it is BOM). – kestasx Dec 30 '14 at 22:09
Saving the file with a BOM indeed fixes the warnings, but some of the glyphs appear broken in my application. If I cross-compile my application with MXE + MINGW32 on Ubuntu, it works fine on Windows :) – juzzlin Dec 30 '14 at 22:52
Ok, it works now. It was just that I had also another file that was in UTF-8 without a BOM. – juzzlin Dec 30 '14 at 23:16

VS2013 and Unicode literals give warnings

1 Answers1