0

I am comfortable in Python 3.x and Bytearray to Decimal conversion using int.from bytes(). Could come up with the below conversion snippet. Is there a way to achieve the same functionality using Python 2 for positive and negative integers.

val = bytearray(b'\x8f\x0f\xfd\x02\xf4\x95s\x00\x00')
a = int.from_bytes(val, byteorder='big', signed=True)

# print(type(a), type(val), val, a)
# <class 'int'> <class 'bytearray'> bytearray(b'\x8f\x0f\xfd\x02\xf4\x95s\x00\x00') -2083330000000000000000

Need to use Python 2.7 standard libraries to convert byte array to Int.

Eg. bytearray(b'\x00')--> Expected Result: 0
bytearray(b'\xef\xbc\xa9\xe5w\xd6\xd0\x00\x00') --> Expected Result: -300000000000000000000
bytearray(b'\x10CV\x1a\x88)0\x00\x00') --> Expected Result: 300000000000000000000
Dud
  • 69
  • 2
  • 7
  • 1
    You probably better use `pack` for this. – Willem Van Onsem Apr 18 '18 at 17:56
  • I'm actually already surprised that Python-3.x does this, since in Python-3.x, an `int` has arbitrary size. – Willem Van Onsem Apr 18 '18 at 17:57
  • Sure Willem. Will see the docs for it now. Bt will that handle signed bytearrays to int conversion, or only positive integers? – Dud Apr 18 '18 at 17:58
  • Before posting, consider copying and pasting it into google search bar. – wim Apr 18 '18 at 17:59
  • Yup it does perfectly. I m on Python 3.6.1 | Anaconda 4.4.0. When i try this snippet locally in my higher version of Python it works but in 2.7 it does not. – Dud Apr 18 '18 at 18:00
  • Ok, i wont lie that i did check Google before posting, but the one you lead me to seems like only accept " little endian byte order". – Dud Apr 18 '18 at 18:03
  • @WillemVanOnsem It's a perfectly sensible thing to do—but it isn't clear what the intended use is. Arbitrary-length signed ints aren't exactly a common interchange format. When you do see them, they're often in fixed chunks of 4 bytes, not 1 byte—or they're in gmp export format, which `from_bytes` would require a wordsize and nails arguments to handle, at which point it might as well default to `size=4, nails=2` for export so it could just dump its internal format. – abarnert Apr 19 '18 at 04:53

1 Answers1

4

There is no built-in function in Python 2.7 to do the equivalent of int.from_bytes in 3.2+; that's why the method was added in the first place.

If you don't care about handling any cases other than big-endian signed ints, and care about readability more than performance (so you can extend it or maintain it yourself), the simplest solution is probably an explicit loop over the bytes.


For unsigned, this would be easy:

n = 0
for by in b:
    n = n * 256 + by

But to handle negative numbers, you need to do three things:

  • Take off the sign bit from the highest byte. Since we only care about big-endian, this is the 0x80 bit on b[0].
  • That makes an empty bytearray a special case, so handle that specially.
  • At the end, if the sign bit was set, 2's-complement the result.

So:

def int_from_bytes(b):
    '''Convert big-endian signed integer bytearray to int

    int_from_bytes(b) == int.from_bytes(b, 'big', signed=True)'''
    if not b: # special-case 0 to avoid b[0] raising
        return 0
    n = b[0] & 0x7f # skip sign bit
    for by in b[1:]:
        n = n * 256 + by
    if b[0] & 0x80: # if sign bit is set, 2's complement
        bits = 8*len(b)
        offset = 2**(bits-1)
        return n - offset
    else:
        return n

(This works on any iterable of ints. In Python 3, that includes both bytes and bytearray; in Python 2, it includes bytearray but not str.)


Testing your inputs in Python 3:

>>> for b in (bytearray(b'\x8f\x0f\xfd\x02\xf4\x95s\x00\x00'),
...           bytearray(b'\x00'),
...           bytearray(b'\xef\xbc\xa9\xe5w\xd6\xd0\x00\x00'),
...           bytearray(b'\x10CV\x1a\x88)0\x00\x00')):
...     print(int.from_bytes(b, 'big', signed=True), int_from_bytes(b))
-2083330000000000000000 -2083330000000000000000
0 0
-300000000000000000000 -300000000000000000000
300000000000000000000 300000000000000000000

And in Python 2:

>>> for b in (bytearray(b'\x8f\x0f\xfd\x02\xf4\x95s\x00\x00'),
...           bytearray(b'\x00'),
...           bytearray(b'\xef\xbc\xa9\xe5w\xd6\xd0\x00\x00'),
...           bytearray(b'\x10CV\x1a\x88)0\x00\x00')):
...     print int_from_bytes(b)
-2083330000000000000000
0
-300000000000000000000
300000000000000000000

If this is a bottleneck, there are almost surely faster ways to do this. Maybe via gmpy2, for example. In fact, even converting the bytes to a hex string and unhexlifying might be faster, even though it's more than twice the work, if you can find a way to move those main loops from Python to C. Or you could merge up the results of calling struct.unpack_from on 8 bytes at a time instead of handling each byte one by one. But this version should be easy to understand and maintain, and doesn't require anything outside the stdlib.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • thanks a lottt.. :) Works smooth. Have modified the question title as well. @Moderators plz feel free to update it if it still seems to be misleading. Nice snippet by abarnert for ppl with similar requirements. – Dud Apr 19 '18 at 04:25
  • 1
    For the record, converting to hex and parsing is *much* faster; `int(binascii.hexlify(mybytes), 16)` is *really* fast (I've had cause to reinvent this before, and that solution was orders of magnitude faster than any other option). – ShadowRanger Apr 19 '18 at 23:58