71

I'm attempting to decode text which is prefixing certain 'special characters' with \x. I've worked out the following mappings by hand:

\x28   (
\x29   )
\x3a   :

e.g. 12\x3a39\x3a03 AM

Does anyone recognise what this encoding is?

Alex Angas
  • 59,219
  • 41
  • 137
  • 210
  • Note that it is likely that \x3a39 is a single unit - it would be in C, at any rate. If the string was a wide-character string, it might fit in a single character; in an 8-bit character string, it would overflow, and the value inserted is probably undefined (implementation defined at best). – Jonathan Leffler May 20 '09 at 20:11

4 Answers4

63

It's ASCII. All occurrences of the four characters \xST are converted to 1 character, whose ASCII code is ST (in hexadecimal), where S and T are any of 0123456789abcdefABCDEF.

pts
  • 80,836
  • 20
  • 110
  • 183
  • 10
    You can easily decode this type of text by just putting it into quotes in an interactive Python interpreter. – Paul Fisher May 20 '09 at 20:07
  • @PaulFisher can you provide a hint exactly how? – CodyBugstein Nov 01 '15 at 00:14
  • 1
    @Imray in a console, run 'python' (or 'ipython'), and then at the prompt, type in [ '12\x3a39\x3a03 AM' ] (i.e., everything between the brackets, including the quotes, but excluding the brackets). – Paul Fisher Nov 01 '15 at 16:36
  • Strange, I tried with '\x9fX#`\x9f' and it's kept like this, @PaulFisher – Revolucion for Monica Apr 29 '21 at 14:03
  • @RevolucionforMonica That's because in your case, `\x9f` is not a printable Unicode character -- it is the control character "Application Program Command". Python will escape unprintable characters when printing the `repr` of a string. https://www.compart.com/en/unicode/U+009F – Paul Fisher Apr 30 '21 at 16:56
  • 2
    Note that in C and C++, a `\x` sequence is not limited to two hex digits; it will use as many hex digits as follow the `\x` producing an implementation-defined value. See C11 [§6.4.4.4 Character constants](https://port70.net/~nsz/c/c11/n1570.html#6.4.4.4) for more information — contrast the rules for octal and hexadecimal values. Other languages may have different rules. – Jonathan Leffler Jun 08 '22 at 20:09
27

The '\xAB' notation is used in C, C++, Perl, and other languages taking a cue from C, as a way of expressing hexadecimal character codes in the middle of a string.

The notation '\007' means use octal for the character code, when there are digits after the backslash.

In C99 and later, you can also use \uabcd and \U00abcdef to encode Unicode characters in hexadecimal (with 4 and 8 hex digits required; the first two hex digits in \U must be 0 to be valid, and often the third digit will be 0 too — 1 is the only other valid value).

Note that in C, octal escapes are limited to a maximum of 3 digits but hexadecimal escapes are not limited to 2 or 3 digits; the hexadecimal escape ends at the first character that's not a hexadecimal digit. In the question, the sequence is "12\x3a39\x3a03". That is a string containing 4 characters: 1, 2, \x3a39 and \x3a03. The actual value used for the 4-digit hex characters is implementation-defined. To achieve the desired result (using \x3A to represent a colon :), the code would have to use string concatenation:

"12\x3a" "39\x3a" "03"

This now contains 8 characters: 1, 2, :, 3, 9, :, 0, 3.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
8

I use CyberChef for this sort of thing.

If you drop it in the input field and drag Magic from the Favourites list into the recipe it'll tell you the conversion and that you could've used the From_Hex recipe with a \x delimiter.

rich
  • 18,987
  • 11
  • 75
  • 101
0

I'm guessing that what you are dealing with is a unicode string that has been encoded differently than the output stream it was sent to. ie. a utf-16 string output to a latin-1 device. In that situation, certain characters will be outputted as escape values to avoid sending control characters or wrong characters to the output device. This happens in python at least.

lostlogic
  • 1,514
  • 12
  • 10
  • So my question @lostlogic, how does one get the original encoding that was supposed to have been sent. I do have a similar problem – unlockme Dec 27 '18 at 03:28