5

I'm aware that there is already a standard method by prefixing with L:

wchar_t *test_literal = L"Test";

The problem is that wchar_t is not guaranteed to be 16-bits, but for my project, I need a 16-bit wchar_t. I'd also like to avoid the requirement of passing -fshort-wchar.

So, is there any prefix for C (not C++) that will allow me to declare a UTF-16 string literal?

  • "*I need a 16-bit `wchar_t`*" - why? – melpomene Jun 02 '18 at 14:28
  • 2
    @melpomene 1. I am on an embedded platform. 2. It is part of a Windows-like API. –  Jun 02 '18 at 14:29
  • 1
    What's wrong with `-fshort-wchar`? – melpomene Jun 02 '18 at 14:29
  • 1
    @melpomene The prefix will be part of a header file, included by my library and an application. I don't want to force the application to use `-fshort-wchar`. –  Jun 02 '18 at 14:30
  • This feels like some sort of XY problem. – melpomene Jun 02 '18 at 14:31
  • 1
    You'd be better off initialising as they are, and provide a conversion function to convert the literal to an array of whatever type you use to specifically represent UTF-16 characters (`short`, `int16_t`), or whatever. That will make it easier on systems where `wchar_t` and `UTF-16` are not the same. – Peter Jun 02 '18 at 14:32
  • @melpomene Yeah... I want to have a `WCHAR` type, and a `TEXT` macro, like Windows. –  Jun 02 '18 at 14:32
  • But why? What is the overall problem you're trying to solve here? – melpomene Jun 02 '18 at 14:33
  • @melpomene I want to be able to switch between ASCII and Unicode. So, I would make a `TEXT` macro that took a literal as a parameter, and depending on whether the library was built for ASCII or Unicode, optionally prefix the literal to turn it into a wchar_t. –  Jun 02 '18 at 14:34
  • Yes, but *why*? – melpomene Jun 02 '18 at 14:35
  • Otherwise I have to use an ugly array. `wchar_t str[4] = { 'T', 'e', 's', 't' }` –  Jun 02 '18 at 14:35
  • No, you could just provide a single UTF-8 interface. Why force applications to recompile if they want to use Unicode? – melpomene Jun 02 '18 at 14:36
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/172306/discussion-between-mark-yisri-and-melpomene). –  Jun 02 '18 at 14:36

2 Answers2

7

So, is there any prefix for C (not C++) that will allow me to declare a UTF-16 string literal?

Almost, but not quite. C2011 offers you these options:

  • character string literals (elements of type char) - no prefix. Example: "Test"
  • UTF-8 string literals (elements of type char) - 'u8' prefix. Example: u8"Test"
  • wide string literals of three flavors:
    • wchar_t elements - 'L' prefix. Example: L"Test"
    • char16_t elements - 'u' prefix. Example: u"Test"
    • char32_t elements - 'U' prefix. Example: U"Test"

Note well, however, that although you can declare a wide string literal having elements of type char16_t, the standard does not guarantee that the UTF-16 encoding will be used for them, nor does it make any particular requirements on which characters outside the language's basic character set must be included in the execution character set. You can test the former at compile time, however: if char16_t represents UTF-16-encoded characters in a given conforming implementation, then that implementation will define the macro __STDC_UTF_16__ to 1.

Note also that you need to include (C's) uchar.h header to use the char16_t type name, but the u"..." syntax for literals does not depend on that. Take care, as this header name collides with one used by the C interface of the International Components for Unicode, a relatively widely-used package for Unicode support.

Finally, be aware that much of this was new in C2011. To make use of it, you need a conforming C2011 implementation. Those are certainly available, but so are a lot of implementations that conform only to earlier standards, or even to none. Standard C99 and earlier do not provide a string literal syntax that guarantees 16-bit elements.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
-2

You need a 16 bit wchar_t - but it's out of your control. If the compiler says it's 32 bit then it's 32 bit and it doesn't matter what you want or need.

The string classes are templated. You can always use a template to create a template class with 16 bit characters. I personally would try to remove any Unicode handling that is not UTF-8.

An alternative method is a clever #ifdef that will produce a compile time error if wchar_t is not 16 bit, and solve the problem when you actually need to solve it.

gnasher729
  • 51,477
  • 5
  • 75
  • 98
  • 5
    Templated string classes? In C? – melpomene Jun 02 '18 at 14:47
  • I think I will have to use the `#ifdef` and `-fshort-wchar`. It's the only method that is guaranteed to work. –  Jun 02 '18 at 14:49
  • 1
    Indeed `wchar_t` is not guaranteed to be 16-bit -- it could be either more or less -- but C2011 *does* have `char16_t`, which is exactly 16 bits, and a syntax for wide string literals having elements of that type. – John Bollinger Jun 02 '18 at 14:49
  • @JohnBollinger Problem is that not all compilers support C2011 yet (and I think especially embedded toolchains). –  Jun 02 '18 at 14:52
  • 1
    That's quite true, @MarkYisri, but C2011 is the current C standard, and it's not even that new any more. Whereas we can and should recognize that some relevant implementations do not conform to that version, questions that are not otherwise qualified should be interpreted first in light of the current version of the language. – John Bollinger Jun 02 '18 at 14:59