10

I am reading the book: C: In a Nutshell, and after reading the section Character Sets, which talks about wide characters, I wrote this program:

#include <stdio.h>
#include <stddef.h>
#include <wchar.h>

int main() {
  wchar_t wc = '\x3b1';
  wprintf(L"%lc\n", wc);
  return 0;
}

I then compiled it using gcc, but gcc gave me this warning:

main.c:7:15: warning: hex escape sequence out of range [enabled by default]

And the program does not output the character α (whose unicode is U+03B1), which is what I wanted it to do.

How do I change the program to print the character α?

dda
  • 6,030
  • 2
  • 25
  • 34
Yishu Fang
  • 9,448
  • 21
  • 65
  • 102

3 Answers3

8

This works for me

#include <stdio.h>
#include <stddef.h>
#include <wchar.h>
#include <locale.h>

int main(void) {
  wchar_t wc = L'\x3b1';

  setlocale(LC_ALL, "en_US.UTF-8");
  wprintf(L"%lc\n", wc);
  return 0;
}
David Ranieri
  • 39,972
  • 7
  • 52
  • 94
  • You can change LC_ALL by LC_CTYPE (This category applies to classification and conversion of characters, and to multibyte and wide characters) – David Ranieri Oct 21 '12 at 09:17
  • This does not work on Windows; the locale name is different, and console does not speak UTF-8 by default. – rubenvb Oct 21 '12 at 09:47
  • 1
    I know, but he is working on Ubuntu, do you suggest other than a conditional compilation? – David Ranieri Oct 21 '12 at 10:03
  • 2
    You could use `setlocale(LC_ALL, "")` to use the locale configured in the execution environment, be it Linux or Windows. – Joni Oct 21 '12 at 16:42
7
wchar_t wc = L'\x3b1';

is the correct way to initialise a wchar_t variable to U+03B1. The L prefix is used to specify a wchar_t literal. Your code defines a char literal and that's why the compiler is warning.

The fact that you don't see the desired character when printing is down to your local environment's console settings.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • Emmmm, how to set my console? – Yishu Fang Oct 21 '12 at 08:21
  • 1
    I've got no idea. You didn't state what OS you are on. Also, you didn't ask about that. You asked about how to initialise the variable. – David Heffernan Oct 21 '12 at 08:24
  • I am using Ubuntu linux, can you get the right result on your computer? I was just thinking that the problem is caused by the initialization, I have never thought about the environment before. – Yishu Fang Oct 21 '12 at 08:27
  • Your environment almost certainly prefers UTF-8. Use char* instead and encode using UTF-8, that is 0xCE 0xB1. Print with printf. – David Heffernan Oct 21 '12 at 08:31
  • I am just want to do an experiment on using wchar_t to output an character, I don't want use char* here. I still don't know how to solve my problem... – Yishu Fang Oct 21 '12 at 08:45
  • Neither do I. I don't use Linux. Trying to use UTF-16 or UTF-32 on a Linux console is akin to using UTF-8 on a Windows console. I think I answered the question you asked concerning variable initialization. – David Heffernan Oct 21 '12 at 08:50
  • Hmm... Aren't hex integer constants dependent on endianness? The standard C way to initialize Unicode characters is by using the new `u` or `U` prefixes. – rubenvb Oct 21 '12 at 09:11
  • 1
    @rubenv No, \x3b1 is the same value no matter what the endianness is. If it was as you said, the entire language would be utterly useless. – David Heffernan Oct 21 '12 at 09:45
  • @Prototype "Worked for me" isn't how to approach programming. Trial and error is not a valid programming technique. You need to understand the rules enough to be able to analyse and reason about a problem statically. – David Heffernan Oct 21 '12 at 09:47
  • @DavidHeffernan what works or doesn't isn't the question here. The question was THE CORRECT way to initialize a wchar_t variable. C standard says use '\uxxxx' and not initialize with hex-strings. Though it might work now, no telling it will work on my machine. – Aniket Inge Oct 21 '12 at 09:49
  • @PrototypeStark Which C standard? The older and more commonly used ones don't have that syntax. You can't just say "the C standard". You have to specify which one and the question doesn't do so. Your answer seems quite happy to use the exact same code as mine. Except that you had some strange talk about needing a leading `0`. No idea what that was all about. – David Heffernan Oct 21 '12 at 09:53
  • I don't understand why this answer gets downvoted when the other answers that have the same code but omit any attempt at explanation do not receive downvotes. Am I failing to understand something? – David Heffernan Oct 21 '12 at 13:06
  • 1
    @DavidHeffernan: There is only one C standard. All the old ones have been withdrawn. Only if you want to talk about an obsolete version do you really need to specify which one you are talking about. – CB Bailey Oct 22 '12 at 07:35
  • @CharlesBailey Do you think that the person asking the question, who is learning C, working from a book based on a withdrawn standard, is working to that convention? – David Heffernan Oct 22 '12 at 07:57
  • I don't know (I doubt it) but isn't that a question for @PrototypeStark, not me? – CB Bailey Oct 22 '12 at 08:04
-1

try L'\x03B1' It might just solve your problem. IF you're in doubt you can try :

'\u03b1' to initialize.
Aniket Inge
  • 25,375
  • 5
  • 50
  • 78