1

I am trying to figure out what character set putchar uses. Seemingly, it cannot print multi-byte characters:

putchar('€') //gcc warning: multi-character character constant

But when the codepage of the terminal in Windows is set to 1252 (West European Latin) with chcp 1252, the following code is able to print the Euro sign:

putchar(128)

But still, even though the terminal's charset is set to 1252, putchar('€') cannot print the Euro sign.

Can anybody please explain the above (seeming) discrepancy to me?

Thank you very much.

Spikatrix
  • 20,225
  • 7
  • 37
  • 83

1 Answers1

1

char in C for all practical purposes means "byte", not "character"

Your source file is most likely encoded in UTF-8, where the euro symbol is encoded as the following 3 bytes: 0xE2 0x82 0xAC.

putchar, as the name implies, writes single bytes. C as a language has no notion of "characters" or "encodings", and GCC by default uses the exact bytes it found in the source file. So in your case it prints a byte 0xAC (the least significant byte of '€') to the standard output. It doesn't matter how it looks like in your editor or what encoding the file is supposed to be. GCC doesn't case, it copies bytes as-is.

What the terminal displays given the stream of bytes from a program, it depends solely on the settings of that terminal. If you want to display UTF-8 encoded text in Windows terminal, you should enter chcp 65001 and change the font to Lucida.

Since your editor displays the bytes according to a specified encoding, and a terminal displays the same bytes using some encoding, then (as long as you use GCC or Clang with default settings) if the editor and terminal use the same encoding, you should see the same characters in both programs.

EDIT: Few remarks about how GCC handles encodings:

There are two options: -finput-charset and -fexec-charset. GCC treats bytes in narrow string and char literals literally only if those two options are identical. If they are not, GCC converts them from input encoding to exec encoding.

After a bit of testing, I conclude that for some reason your GCC runs with Windows-1250 as input encoding and UTF-8 as exec encoding.

If you want to make really really sure you are using the right encoding, add -finput-charset=cp1250 -fexec-charset=cp1250 to compiler options.

Also, this way you can make your program run in the default encoding of your console if you so desire.

Karol S
  • 9,028
  • 2
  • 32
  • 45
  • Many thanks for the explanation. Although I am already aware about most of what you described, your explanation made things clearer for me. Still what confuses me is that I cannot make the following conversion: `printf("%c\n", 128)`; `printf("%d\n", "€");` Under the same environment, and the same encoding settings, I expect the first statement give '€' and the second one give 128. – c_enthusiast Feb 18 '15 at 12:22
  • The apparent result of the first depends on the encoding of the terminal. The result of the second (and assuming that you wrote `'€'` not `"€"`) depends mostly on the encoding of the source file. Those may be two different encodings. Please check the encoding of the source file again. – Karol S Feb 18 '15 at 14:36
  • As far as I know multi-byte characters (those beyond ASCII range) cannot be represented with single-quotes, therefore I used double-quotes to represent "€". The encoding of the source file and the terminal is 1252, by the way. – c_enthusiast Feb 19 '15 at 06:55
  • "multibyte" and "Windows-1252" is a contradiction. You cannot have multibyte characters in a Windows-1252 file. How are you invoking the compiler? Also, I'll add few things to the answer in a moment. – Karol S Feb 19 '15 at 10:13
  • Yes you are right, '€' is (certainly) not multibyte in codepage 1252. I put it wrong, sorry. Normally, I am invoking the compiler with standard options, that is: `gcc test.c -o test`. Following your advice, I added the options `-finput-charset=cp1252 -fexec-charset=cp1252`. Now the output is as I expected (somehow). The only surprising thing is that I get a negative number (-128). – c_enthusiast Feb 19 '15 at 13:03