3

I cannot make clang on 64 bit linux to compile wchar_t to 16 bit

printf("%i ", sizeof (wchar_t));

prints 4

clang++ -fshort-char test.cpp

although I use -fshort-char

my question is how to make wchar_t to 16 bit. is #define the only option?

#define wchar_t unsigned short
exebook
  • 32,014
  • 33
  • 141
  • 226
  • 2
    You cannot `#define wchar_t`. It is generally undefined behavior and in this particular case will guarantee that none of the standard wide string functions (that are already compiled and would not be aware of the change) work. If you are going to write your own functions, and that may be your only recourse, don't call your wide char type `wchar_t` (the `_t` suffix is reserved by POSIX anyway). – Pascal Cuoq Jan 18 '14 at 08:51
  • By the same token, you shouldn't print the result of `sizeof` with `%i`. Use `%zu` if your compiler supports it, or `printf("%u ", (unsigned int) sizeof (wchar_t));` – Pascal Cuoq Jan 18 '14 at 08:52
  • Why does `wchar_t` have to be 16 bits for you? – Mike Kinghan Jan 19 '14 at 17:37
  • 1
    @MikeKinghan, because 16 bits is very convenient to handle all common languages. Why double stretch every type out there? They could stretch boolean to 64 bits as well. – exebook Jan 20 '14 at 11:40

1 Answers1

6

The way to make wchar_t be 16 bits with clang (or gcc) is to pass the compiler option -fshort-wchar (not -fshort-char).

This is a rather drastic measure, however, as it may break code that calls the Standard Library or 3rd party libraries passing wchar_t data. Note that by default wchar_t is 32 bits for gcc and clang regardless of whether the compiler targets 32-bit or 64-bit systems. In this respect they conform with the C Standard 7.17/2 which requires wchar_t to be:

an integer type whose range of values can represent distinct codes for all members of the largest extended character set specified among the supported locales

As you are invoking clang++ I presume you are compiling C++ source. If you simply require a character type that is 16 bits wide, and can compile with the option -std=c++11, the core language offers you char16_t, which is suitable for storing any UTF-16 character. (And should you wish to be able to store any UTF-32 character, char32_t will do).

Mike Kinghan
  • 55,740
  • 12
  • 153
  • 182
  • Note that Unicode requires a minimum of 21-bits to represent the the initial standardized code points, and this would mean that a 16-bit wchar_t could never be used to represent Unicode code points. On GNU/Linux the standard ABI for wchar_t is a 32-bit type which is more than enough to represent all the current Unicode code points and could even be implemented directly as UTF-8 internally. – Carlos O'Donell Feb 23 '17 at 02:55