How to convert \x after base64 decoding into something readable?

Question

I'm trying to decode the following Base64 string into readable text.

T2ggeWVhaCEgAQ==

I'm using Python Base64 library to do so. However, when I do, I get:

>>> base64.b64decode("T2ggeWVhaCEgAQ==")
'Oh yeah! \x01'

What is \x01?

How do I decode such that all the text is readable and I don't get any weird symbols?

score 1 · Answer 1 · answered Nov 01 '15 at 01:01

1

You could filter out the unreadable characters:

from string import printable
print ''.join(c for c in base64.b64decode('T2ggeWVhaCEgAQ==') if c in printable)

answered Nov 01 '15 at 01:01

tdelaney

73,364
6
83
116

jfs · Accepted Answer · 2015-11-01T20:31:07.683

1

'\x01' is a text representation of bytes in Python 2. '\x01' is a single byte. Bytes that are in ASCII printable range represent themselves e.g., you see 'O' instead of '\x4f':

>>> b'\x4f\x68\x20\x79\x65\x61\x68\x21\x20\x01'
'Oh yeah! \x01'

To remove all "weird" bytes (to keep characters from string.printable):

#!/usr/bin/env python
import string

weird = bytearray(set(range(0x100)) - set(map(ord, string.printable)))
print(b'Oh yeah! \x01'.translate(None, weird).decode())
# -> Oh yeah!

string.printable contains some non-printable characters such as '\t' (tab), '\n' (newline). To exclude them too and to leave only printing character:

printing_chars = range(0x20, 0x7e + 1)
weird = bytearray(set(range(0x100)) - set(printing_chars))
print(b'Oh yeah! \x01'.translate(None, weird))
# -> Oh yeah!

edited Nov 01 '15 at 20:31

answered Nov 01 '15 at 19:59

jfs

399,953
195
994
1,670

I get that `\x01` is a smiley face icon – CodyBugstein Nov 02 '15 at 00:16
`'\x01'` is not a smiley face icon. It is a byte (the numerical value is one). It is a control character. If you print it; it is invisible. – jfs Nov 02 '15 at 00:38
When I do `print 'Oh yeah! \x01'` in the Python console, I get: `Oh yeah! ☺` – CodyBugstein Nov 02 '15 at 04:30
I've enumerated all characters encodings available on my machine -- none produce the smiley for `'\x01'` (`u'\u263a'.encode(encoding)`). What is `sys.stdout.encoding` in your case? What is the font? – jfs Nov 02 '15 at 11:29
1

[yes, in cp437, it seems some control characters can be displayed differently](https://en.wikipedia.org/wiki/Code_page_437) in [some contexts](http://stackoverflow.com/a/14553297/4279). `b'\x01'.decode('cp437') == u'\x01' != u'\u263a'` and `u'\u263a'.encode('cp437')` raises `UnicodeEncodeError` my system. For a portable code, use Unicode: `print(u'Oh yeah! \u263a')` (to make it work on Windows, install `win-unicode-console` package) – jfs Nov 02 '15 at 15:02

score 0 · Answer 3 · answered Nov 01 '15 at 00:37

The last byte of the Base64 encoded data is hex 01. This isn't a printable character in any commonly used encoding; there's no way to make it into "readable text" without turning it into something it isn't.

How to convert \x after base64 decoding into something readable?

3 Answers3