47

Example:

char arr[] = "\xeb\x2a";

BTW, are the following the same?

"\xeb\x2a" vs. '\xeb\x2a'

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
user198729
  • 61,774
  • 108
  • 250
  • 348

7 Answers7

44

\x indicates a hexadecimal character escape. It's used to specify characters that aren't typeable (like a null '\x00').

And "\xeb\x2a" is a literal string (type is char *, 3 bytes, null-terminated), and '\xeb\x2a' is a character constant (type is int, 2 bytes, not null-terminated, and is just another way to write 0xEB2A or 60202 or 0165452). Not the same :)

Seth
  • 45,033
  • 10
  • 85
  • 120
  • 4
    "type is `char`, 2 bytes" - hm, I don't think that's generally going to fit into a `char`. – Cascabel Mar 30 '10 at 17:56
  • @Jefromi True enough, I suppose the type is more accurately described as a `char[2]`. Updated. – Seth Mar 30 '10 at 18:07
  • Is `'\xeb\x2a'` the same as `char[0]='\xeb'`,`char[1]='\x2a'`? – user198729 Mar 30 '10 at 18:11
  • 4
    `'\xeb\x2a'` does **not** have a type char[2] - it's an `int` with a value that's implementation-defined. – Michael Burr Mar 30 '10 at 18:15
  • 2
    What about `'\xeb\x2a\xeb\x2a\xeb'` and `'\xeb\x2a\xeb'` then? – user198729 Mar 30 '10 at 18:17
  • @user198729: multibyte character constants are a language extention. Nonetheless, character constants have a type of `int` in C, so providing more than an `int`s worth of data makes little sense anyway. – Evan Teran Mar 30 '10 at 18:19
  • @Evan Teran: They're not a language extension; they are allowed in C, it's just that their value is implementation defined. – CB Bailey Mar 30 '10 at 18:21
  • And to answer your question: `int x = '\x01\x02\x03\x04\x05';` yeilds a warning: `warning: character constant too long for its type` which has an implementation defined (perhaps undefined is more accurate?) value. – Evan Teran Mar 30 '10 at 18:22
  • @Charles Bailey,can you explain what different implementations are there for `'\xeb\x2a\xeb\x2a\xeb'` and `'\xeb\x2a\xeb'`? – user198729 Mar 30 '10 at 18:22
  • @Evan Teran,so character constant(`'xx..'`) has a limited length(<=`int`,or say 4 characters maximum,`'abcd'`),right? – user198729 Mar 30 '10 at 18:23
  • @user198729: a character constant has a type of `int` no matter how many bytes you shove at it. – Evan Teran Mar 30 '10 at 18:24
  • @user198729: No, sorry, I really don't know. That the results are implementation defined makes using them inherently less portable. I've never had a need for them so I've never investigated what any implementations that I use actually specify. – CB Bailey Mar 30 '10 at 18:25
  • @Charles Bailey: perhaps we are confusing terms. The standard speaks about multibyte characters with reference to character sets (as in things like UTF-8 and such). Which is not what is being talked about here. What I am referring to is a character constant which is written like so: `int x = '\x02\x03';` which I do not think is in the standard at all and thus would be a language extension. But I could be wrong too, do you have a reference for why it isn't? – Evan Teran Mar 30 '10 at 18:26
  • @user198729: I've found through **basic** tests that gcc tests to use the least significant bytes (at most 4) of a character constant written like you've done. so `int x = '\x01\x02\x03\x04\x05'; printf("%08x\n", x);` yields "02030405" – Evan Teran Mar 30 '10 at 18:28
  • @Evan Teran: Please see 6.4.4.4/10. "... The value of an integer character constant containing more than one character..." – CB Bailey Mar 30 '10 at 18:31
  • @Charles Bailey: you are correct: 6.4.4.4p10 says "An integer character constant has type int. The value of an integer character constant containing a single character that maps to a single-byte execution character is the numerical value of the representation of the mapped character interpreted as an integer. The value of an integer character constant containing more than one character (e.g.,'ab'), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined. – Evan Teran Mar 30 '10 at 18:33
  • 4-byte character constants are extremely handy. (Pre-)Carbon Mac OS used them extensively. 'TEXT' is so much more readable than 0x54455854. – Seth Apr 01 '10 at 01:59
  • could you tell me what is "\x01C" mean? 1.5 bytes? – Scott 混合理论 Aug 28 '15 at 10:08
  • @Scott混合理论 - It means 0x1C, a single byte, because of the leading 0. Half-bytes ("nybbles") are very uncommon, and probably not part of the C language. – Seth Aug 28 '15 at 15:55
  • @Seth VC6 use this half-bytes? – Scott 混合理论 Sep 01 '15 at 03:44
  • @Scott混合理论 - No. By very uncommon, I mean there is no intrinsic type that stores half a byte (with the possible exception of custom processors). You can use half a byte, but the data will be stored in a full byte in hardware. – Seth Sep 01 '15 at 18:47
11

As other have said, the \x is an escape sequence that starts a "hexadecimal-escape-sequence".

Some further details from the C99 standard:

When used inside a set of single-quotes (') the characters are part of an "integer character constant" which is (6.4.4.4/2 "Character constants"):

a sequence of one or more multibyte characters enclosed in single-quotes, as in 'x'.

and

An integer character constant has type int. The value of an integer character constant containing a single character that maps to a single-byte execution character is the numerical value of the representation of the mapped character interpreted as an integer. The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined.

So the sequence in your example of '\xeb\x2a' is an implementation defined value. It's likely to be the int value 0xeb2a or 0x2aeb depending on whether the target platform is big-endian or little-endian, but you'd have to look at your compiler's documentation to know for certain.

When used inside a set of double-quotes (") the characters specified by the hex-escape-sequence are part of a null-terminated string literal.

From the C99 standard 6.4.5/3 "String literals":

The same considerations apply to each element of the sequence in a character string literal or a wide string literal as if it were in an integer character constant or a wide character constant, except that the single-quote ' is representable either by itself or by the escape sequence \', but the double-quote " shall be represented by the escape sequence \".


Additional info:

In my opinion, you should avoid avoid using 'multi-character' constants. There are only a few situations where they provide any value over using an regular, old int constant. For example, '\xeb\x2a' could be more portably be specified as 0xeb2a or 0x2aeb depending on what value you really wanted.

One area that I've found multi-character constants to be of some use is to come up with clever enum values that can be recognized in a debugger or memory dump:

enum CommandId {
    CMD_ID_READ  = 'read',
    CMD_ID_WRITE = 'writ',
    CMD_ID_DEL   = 'del ',
    CMD_ID_FOO   = 'foo '
};

There are few portability problems with the above (other than platforms that have small ints or warnings that might be spewed). Whether the characters end up in the enum values in little- or big-endian form, the code will still work (unless you're doing some else unholy with the enum values). If the characters end up in the value using an endianness that wasn't what you expected, it might make the values less easy to read in a debugger, but the 'correctness' isn't affected.

Michael Burr
  • 333,147
  • 50
  • 533
  • 760
  • Is `'\xeb\x2a'` the same as `char[0]='\xeb',char[1]='\x2a'`? – user198729 Mar 30 '10 at 18:15
  • No, as the answer states, `'\xeb\x2a'` is an `int` with an implementation-defined value. The value is almost certainly either 0xeb2a or 0x2aeb; which one will almost certainly depend on the endianess of the platform. I suppose some compilers might transform `'\xeb\x2a'` to 0xffffeb2a if they decide to do sign extension. I have no idea how likely that might be. – Michael Burr Mar 30 '10 at 18:21
  • I also want to point out that in C++, '\x50' is of type `char` instead of `int`, which is different from C. – Searene Sep 10 '18 at 13:39
3

When you say:

BTW,are these the same:

"\xeb\x2a" vs '\xeb\x2a'

They are in fact not. The first creates a character string literal, terminated with a zero byte, containing the two characters who's hex representation you provide. The second creates an integer constant.

  • 1
    Can you elaborate a little about `'\xeb\x2a'`? – user198729 Mar 30 '10 at 18:04
  • '\xeb' is a single character (=a byte) so '\xeb\x2a' is two bytes = 16bit = short int – Martin Beckett Mar 30 '10 at 18:11
  • @Martin Beckett: No, a character constant which contains more than one character is still an `int`, not a `short int`. Its value is implementation defined. – CB Bailey Mar 30 '10 at 18:18
  • @user198729 What Charles said. In C, anything between single quotes is converted to an integer. So in ASCII 'A' is converted to 65. If you say 'AB', then that might get converted to (65 << 8) + 66. Or it might not - the conversion is implementation defined. –  Mar 30 '10 at 18:26
  • @Charles Bailey - sorry, I was trying to say that a two digit \x was a byte and so two of them was a 16bit value (ie a short int). But yes in C even just '0' is an 'integer'. – Martin Beckett Mar 30 '10 at 18:43
  • @Martin Beckett: No need to be sorry, I was just trying to correct what looked to me like a point of fact. – CB Bailey Mar 30 '10 at 18:50
1

It's a special character that indicates the string is actually a hexadecimal number.

http://www.austincc.edu/rickster/COSC1320/handouts/escchar.htm

badcodenotreat
  • 226
  • 4
  • 11
  • 1
    It's probably best to actually provide the full answer here, rather than linking the OP to a place where they can find it - and that link doesn't have any more explanation than "hexadecimal escape character". – Cascabel Mar 30 '10 at 17:53
1

The \x means it's a hex character escape. So \xeb would mean character eb in hex, or 235 in decimal. See http://msdn.microsoft.com/en-us/library/6aw8xdf2.aspx for ore information.

As for the second, no, they are not the same. The double-quotes, ", means it's a string of characters, a null-terminated character array, whereas a single quote, ', means it's a single character, the byte that character represents.

Tarka
  • 4,041
  • 2
  • 22
  • 33
1

\x allows you to specify the character by its hexadecimal code.

This allows you to specify characters that are normally not printable (some of which have special escape sequences predefined such as '\n'=newline and '\t'=tab '\b'=bell)

Loopo
  • 2,204
  • 2
  • 28
  • 45
0

A useful website is here.

And I quote:

x Unsigned hexadecimal integer

That way, your \xeb is like 235 in decimal.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
SeargX
  • 117
  • 1
  • 2
  • 10
  • 1
    That link is to a description of the format strings consumed by `printf()` and the like, not the escape sequences understood by a C literal. It is also C++ specific, but since the OP is asking about the mythical language C/C++ I'll let that pass. – RBerteig Jul 01 '17 at 00:45