1

In this link you'll find Table 1 which I reproduce below. Notice that the heading of the third column on this table specifies: UTF-16LE w/ or w/o BOM. I was able to save my file containing the snippet below,

#include <iostream>
int main()
{
    char c[] = u8"屰";
    int i = 1;
}

by selecting the menu option Advanced Save Options... under the File menu in VS2015, with the Unicode codepage 1200 which is exactly the one corresponding to the UTF-16LE encoding with BOM. This can be checked out in the second picture below, where I pasted a copy of the file image obtained with the binary editor. One can see that the first two bytes on the file are 0xFF 0xFE wich represent the BOM for the UTF16-LE encoding. But I wasn't able to find an option in the Advanced Save Options... dialog box for saving my file with the UTF-16LE encoding without BOM. How should I do this?

Table 1 - Example of results today when compiling code with various encodings.

File encoding UTF-8 w/ BOM UTF-16LE w/ or w/o BOM UTF-8 w/o BOM DBCS (936)
Bytes in source file representing 屰 E5 B1 B0 70 5C E5 B1 B0 8C DB
Source conversion UTF-8 -> UTF-8 UTF-16LE -> UTF-8 1252 -> UTF-8 1252 -> UTF-8
Internal (UTF-8) representation E5 B1 B0 E5 B1 B0 C3 A5 C2 B1 C2 B0 C5 92 C3 9B

enter image description here

Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
João Afonso
  • 1,934
  • 13
  • 19

1 Answers1

2

It's not an option with VS2015. Even popular Notepad++ doesn't have an option for UTF-16 without BOM.

If just doing this for experimentation, use any hex editor to remove the BOM after saving. There is a binary editor built into VS2015. After saving a file as UTF-16LE with BOM, close and re-open the file with the Binary Editor and remove the BOM. I found that VS2015 couldn't open the file correctly without it, though, which may be why the option isn't available.

Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
  • It's strange that the author of the article I linked above mentions this possibility. But in reality, you can't test it. Anyway, thanks for the answer. – João Afonso Jan 29 '17 at 22:15
  • @JoãoAfonso The author was mentioning compiler switches for the source and execution character set. The compiler is not the IDE so compiling from the command line with the right switches is probably supported. – Mark Tolonen Jan 30 '17 at 00:09
  • But still there is an error in Tables 1 and 2 under the column `UTF16-LE w/ or w/o BOM`. The author says that the internal UTF8 representation for the chinese character would be `E5 B1 B0`, but that is not correct for the case `w/o BOM`. If the file doesn't have a signature, the translation would be from codepage 1252 to UTF8, the same as the one attributed to the next column in the Table, `UTF8 w/o BOM` and the final result for the character representation would be different than the one above, `E5 B1 B0`. – João Afonso Jan 30 '17 at 17:49