3

Consider this C program:

unsigned char c2 = '\0101';
printf("%c, %d\n", c2, c2);

I believe that the output should be: A 65 but the actual output is 1 49.

Reasoning: 0 as prefix in character constant declares it in octal format and octal value of 101 is 65. Then the ASCII value corresponding to 65 is A. Can someone tell me where I am going wrong? I tried the same code for hexadecimal as '\x41' and it gave the desired output.

dbush
  • 205,898
  • 23
  • 218
  • 273
Archer
  • 271
  • 5
  • 15
  • 1
    I don't know if you notice btw, if you remove the 0 (\101), you get the desired output – Alexander Santos Dec 24 '19 at 16:42
  • 1
    @user3121023 Octal *escape* sequences. The octal literals don't have this restriction. – Eugene Sh. Dec 24 '19 at 16:43
  • @AlexanderSantos Yeah it works, but then how do we specify that the input is octal ? – Archer Dec 24 '19 at 16:43
  • 1
    @Archer unsigned char c2 = 0101 works – Alexander Santos Dec 24 '19 at 16:44
  • but unsigned char c2 = x41 doesn't work @AlexanderSantos – Archer Dec 24 '19 at 16:45
  • 5
    Because the syntax is `0x41` – Eugene Sh. Dec 24 '19 at 16:45
  • 1
    Note that in `'\xA74129'`, there are three bytes worth of hex; unlike octal, there is no limit on the number of hex characters that make up a hex escape `\xXXX`. Nor is there a requirement that the number of characters in a hex escape is even. See C11 [§6.4.4.4 Character constants](https://port70.net/~nsz/c/c11/n1570.html#6.4.4.4) where the one to three octal digits and unrestricted hex are both clearly specified. – Jonathan Leffler Dec 24 '19 at 16:47
  • 1
    Note that [¶11](https://port70.net/~nsz/c/c11/n1570.html#6.4.4.4p11) states: _The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined._ – Jonathan Leffler Dec 24 '19 at 16:52

2 Answers2

9

The escape sequence for a character in octal format is a backslash followed by up to three digits, meaning the last digit is not part of the escape sequence. This is specified in section 6.4.4.4p1 of the C standard regarding "Character constants":

octal-escape-sequence:
  \ octal-digit
  \ octal-digit  octal-digit
  \ octal-digit  octal-digit  octal-digit

So '\0101' is actually a multi-byte character constant: the first is \010 which is the value 8, and the second is the character '1'.

A leading 0 is used to specify a numeric octal constant, not a character octal constant, so you don't need the leading 0 in this case:

unsigned char c2 = '\101';

If you did want to use a numeric octal constant, you would do this:

unsigned char c2 = 0101;
dbush
  • 205,898
  • 23
  • 218
  • 273
-1

This '\0101' is a multibyte integer character constant. It has the type int. Its internal representation is implementation-defined. So internally it can be represented like

0x00000849

In this declaration

unsigned char c2 = '\0101';

the constant was truncated to 0x49 (the least significant byte) and is assigned to the variable c2.

So this value 0x49 is outputted as a character and as an integer in the printf call.

Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335