3

Got a binary blob string like:

input = "AB02CF4AFF"

Every pair "AB", "02", "CF", "4A", "FF" constitute a byte. I'm doing this:

data = StringIO()
for j in range(0, len(input)/2):
    bit = input[j*2:j*2+2]
    data.write('%c' % int(bit,16))
data.seek(0)

Works ok, but with large binary blobs this becomes unacceptable slow and sometimes event throws a MemoryError.

struct.unpack comes to mind, but no luck thus far.

Any way to speed this up?

Tim Knip
  • 51
  • 7

3 Answers3

5

Use binascii.unhexlify:

>>> import binascii
>>> binascii.unhexlify('AB02CF4AFF')
b'\xab\x02\xcfJ\xff'

(In Python 2 you can decode with the hex codec but this isn't portable to Python 3.)

Gareth Rees
  • 64,967
  • 9
  • 133
  • 163
4

Give input.decode('hex') a try :)

Always a good idea to use built-in solutions

immortal
  • 3,118
  • 20
  • 38
1

How about something like this?

def chrToInt(c):
    if c >= '0' and c <= '9':
        return int(ord(c) - ord('0'))
    elif c >= 'A' and c <= 'F':
        return int(ord(c) - ord('A')) + 10
    else:
        # invalid hex character, throw an exception or something here
        return None

def hexToBytes(input):
    bytes = []

    for i in range(0, len(input) - 1, 2):
        val = (chrToInt(input[i]) * 16) + chrToInt(input[i + 1])

        bytes.append(val)

    return bytes

print hexToBytes("AB02CF4AFF")

You could speed it up quite a bit by making chrToInt branchless by using binary operations, and you could also modify hexToBytes to say exactly how many characters it should read if you decide you want to use something bigger than bytes (so it returns it in groups of 4 for a short or 8 for an int).

Jessie
  • 2,319
  • 1
  • 17
  • 32
  • The answer @immortal gave : `input.decode('hex')` is way faster for my usecase (Need the binary string as input for PIL Image.frombuffer). But thanks for answering, could come in handy for other cases. – Tim Knip Dec 22 '13 at 19:11