13

In this link unsigned wchar_t is typedefed as WCHAR. But I cant find this kind of typedef in my SDK winnt.h or mingw winnt.h.

wchar_t is signed or unsigned?

I am using WINAPIs in C language.

2vision2
  • 4,933
  • 16
  • 83
  • 164
  • A similar question: http://stackoverflow.com/questions/2395514/is-wchar-t-just-a-typedef-of-unsigned-short – Andriy Aug 14 '12 at 13:37
  • I think that page is just incorrect. The library *once upon a time* used to use `unsigned short` when the compiler didn't have a built in `wchar_t` type. Guess the `unsigned` was just left there by mistake when changing to `wchar_t`. – Bo Persson Aug 14 '12 at 15:25
  • 2
    Signed or unsigned, you shouldn't be using it. See utf8everywhere.org – Pavel Radzivilovsky Aug 14 '12 at 19:28
  • 1
    @Pavel: In general, sure, but when you need to write glue code, or compiler tests, or string decoders for a debugger, or any number of other valid use cases you don't have a choice but to use `wchar_t`. Blanket absolutes tend not to be very helpful. – Cameron Nov 29 '16 at 19:46

5 Answers5

17

The signedness of wchar_t is unspecified. The standard only says (3.9.1/5):

Type wchar_t shall have the same size, signedness, and alignment requirements (3.11) as one of the other integral types, called its underlying type.

(By contrast, the types char16_t and char32_t are expressly unsigned.)

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • 6
    The [Windows API](http://msdn.microsoft.com/en-us/library/windows/desktop/aa367308(v=vs.85).aspx) seems to define it as unsigned though. – netcoder Aug 14 '12 at 13:40
  • 2
    @netcoder: "unspecified" doesn't mean that nobody is allowed to define it. It just means that the standard doesn't mandate either signedness. – Kerrek SB Aug 14 '12 at 13:42
  • 9
    Yeah I know what the standard says, and I know how it works. The question is tagged `winapi` though, so I think this extra bit of info is still useful. – netcoder Aug 14 '12 at 13:43
  • 2
    @netcoder yeah its useful though. Thanks. Have a look at the link in my post. – 2vision2 Aug 14 '12 at 13:47
  • 2
    @user1317084: Is your question about C, or about how WinAPI implements certain implementation-defined aspects of C? It would be nice if you could clarify that. – Kerrek SB Aug 14 '12 at 13:50
  • 1
    Do you know if gcc's flags `-funsigned-char` or `-fsigned-char` affect it? In other words, any way to control its signedness? – Antonio Apr 18 '16 at 13:11
  • 1
    @Antonio: I'm afraid I don't know, nor does the GCC documentation seem to describe it. Maybe ask on the GCC mailing lists? – Kerrek SB Apr 18 '16 at 13:24
  • 1
    @Antonio, those flags only affect `char`, I'm pretty sure you can't change whether `wchar_t` is signed. – Jonathan Wakely Apr 18 '16 at 13:27
1

Be aware the type will vary in length by platform.

Windows uses UTF-16 and a wchar_t is 2 bytes. Linux uses a 4 byte wchar_t.

  • 1
    On most of the Linux systems I've seen, `wchar_t` is a 32-bit type, presumably meant for UTF-32 data. – jamesdlin Aug 15 '12 at 01:57
  • Fixed. It's been a few years since I worked with Unicode - I thought i remembered Linux using UTF-8, but if so, why have a four byte wchar_t? –  Aug 15 '12 at 09:37
  • 2
    Most modern Linux systems *do* use UTF-8 normally. That's what `char` is for. A 32-bit `wchar_t` is useful for UTF-32 where you want a fixed-width encoding. – jamesdlin Aug 15 '12 at 10:42
1

The standard may not specify whether wchar_t is signed or unsigned, but Microsoft does. Even if your non-Microsoft compiler disagrees, the Windows API will be using this definition from /Zc:wchar_t (wchar_t Is Native Type):

Microsoft implements wchar_t as a two-byte unsigned value. It maps to the Microsoft-specific native type __wchar_t.

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
0

Type WCHAR, not wchar_t, is defined on MSDN as the following:

   #if !defined(_NATIVE_WCHAR_T_DEFINED)
    typedef unsigned short WCHAR;
    #else
    typedef wchar_t WCHAR;
    #endif

https://learn.microsoft.com/en-us/windows/win32/extensible-storage-engine/wchar

So you could conclude that its defined as unsigned on windows?

user2997204
  • 1,344
  • 2
  • 12
  • 24
-2

I just tested on several platforms, with no optimisation.

1) MinGW (32-bit) + gcc 3.4.4:
---- snip ----
#include<stdio.h>
#include<wchar.h>
const wchar_t BOM = 0xFEFF;
int main(void)
{
    int c = BOM;
    printf("0x%08X\n", c+0x1000);
    return 0;
}
---- snip ----

It prints 0x00010EFF. wchar_t is unsigned. Corresponding assembly code says movzwl _BOM, %eax. Not movSwl, but movZwl.

2) FreeBSD 11.2 (64-bit) + clang 6.0.0:
---- snip ----
#include<stdio.h>
#include<wchar.h>
const wchar_t INVERTED_BOM = 0xFFFE0000;
int main(void)
{
     long long c = INVERTED_BOM;
     printf("0x%016llX\n", c+0x10000000LL);
     return 0;
}
---- snip ----

It prints 0x000000000EFF0000. wchar_t is signed. Corresponfing assembly code says, movq $-131072, -16(%rbp). The 32-bit 0xFFFE0000 is promoted to 64-bit signed -131072.

3) Same code as 2), on RedHat (version unknown) + gcc 4.4.7: It again prints 0x000000000EFF0000. wchar_t is signed.

I tested neither the printf's implementation nor WinAPI's WCHAR definition, but the behaviors of compiler-builtin wchar_t type (no specification about its signedness on any header file) and C-to-ASM compiler engine.

Note that the compilers on 1) and 3) are provided by the same vendor, namely the GNU Project. The answer definitely depends on platforms. (Would somebody test on Visual C++?)