83

In Python 2, converting the hexadecimal form of a string into the corresponding unicode was straightforward:

comments.decode("hex")

where the variable 'comments' is a part of a line in a file (the rest of the line does not need to be converted, as it is represented only in ASCII.

Now in Python 3, however, this doesn't work (I assume because of the bytes/string vs. string/unicode switch. I feel like there should be a one-liner in Python 3 to do the same thing, rather than reading the entire line as a series of bytes (which I don't want to do) and then converting each part of the line separately. If it's possible, I'd like to read the entire line as a unicode string (because the rest of the line is in unicode) and only convert this one part from a hexadecimal representation.

chimeracoder
  • 20,648
  • 21
  • 60
  • 60
  • I'm not sure that hex encoding strings makes all that much sense. If you want to store an incompatible encoding I'd at least use base 64 because it is more efficient. That doesn't invalidate the question / answer in any way of course, maybe somebody else decided upon hex. – Maarten Bodewes Nov 15 '18 at 12:54

3 Answers3

141

Something like:

>>> bytes.fromhex('4a4b4c').decode('utf-8')
'JKL'

Just put the actual encoding you are using.

unbeli
  • 29,501
  • 5
  • 55
  • 57
  • 9
    Unless the decoded string *is* actually utf-8, I would recommend using `decode('ascii')` instead. – Ja͢ck Mar 10 '14 at 02:53
  • @Ja͢ck You could encode to hex if you knew that the Unicode string is incompatible with the encoding used for storing the string. If a string is already known to be ASCII then there is no need to encode it it as a hexadecimal string in the first place. – Maarten Bodewes Nov 15 '18 at 12:52
  • 2
    Doesn't work for all hex strings, though. For instance, `bytes.fromhex('82').decode('utf-8')` raises `UnicodeDecodeError`. Using `'ascii'` format doesn't fix the problem, since that will fail for bytes with values >127. – HackerBoss Sep 12 '19 at 18:03
  • 1
    That's because 0x82 really isn't a valid UTF-8 sequence. Your comment is trivially true in that hex strings which aren't valid UTF-8 cannot be decoded, but that would be true for any other representation of those sequences too. – tripleee Dec 18 '19 at 18:25
  • @MaartenBodewes There are reasons to encode to hex even if it's already ASCII. For example if you want to use it as a file name, you might want to avoid having any characters such as '/' or '\', and hex encoding would fix that. – Buge Feb 22 '20 at 22:45
  • 1
    @Buge That's a good point, although base64url might make more sense for that particular use case. – Maarten Bodewes Feb 22 '20 at 23:01
24
import codecs

decode_hex = codecs.getdecoder("hex_codec")

# for an array
msgs = [decode_hex(msg)[0] for msg in msgs]

# for a string
string = decode_hex(string)[0]
Niklas
  • 23,674
  • 33
  • 131
  • 170
0

The answers from @unbeli and @Niklas are good, but @unbeli's answer does not work for all hex strings and it is desirable to do the decoding without importing an extra library (codecs). The following should work (but will not be very efficient for large strings):

>>> result = bytes.fromhex((lambda s: ("%s%s00" * (len(s)//2)) % tuple(s))('4a82fdfeff00')).decode('utf-16-le')
>>> result == '\x4a\x82\xfd\xfe\xff\x00'
True

Basically, it works around having invalid utf-8 bytes by padding with zeros and decoding as utf-16.

HackerBoss
  • 829
  • 7
  • 16
  • 1
    You are misunderstanding how UTF-8 works. But if your input is UTF-16 (or more properly the pure 16-bit UCS-2 subset) this is useful. – tripleee Dec 18 '19 at 18:40