0

I'm using the following code to pack an integer into an unsigned short as follows,

raw_data = 40

# Pack into little endian
data_packed = struct.pack('<H', raw_data)

Now I'm trying to unpack the result as follows. I use utf-16-le since the data is encoded as little-endian.

def get_bin_str(data):
    bin_asc = binascii.hexlify(data)
    result = bin(int(bin_asc.decode("utf-16-le"), 16))
    trimmed_res = result[2:]
    return trimmed_res

print(get_bin_str(data_packed))

Unfortunately, it throws the following error,

result = bin(int(bin_asc.decode("utf-16-le"), 16)) ValueError: invalid literal for int() with base 16: '㠲〰'

How do I properly decode the bytes in little-endian to binary data properly?

user1241241
  • 664
  • 5
  • 20
  • Why aren't you just using `struct.unpack`? – larsks Jun 19 '22 at 17:34
  • I'm trying to verify if the values are properly encoded so I need to use it this way. – user1241241 Jun 19 '22 at 17:35
  • 1
    What exactly are you trying to accomplish? "properly encoded" must depend on your purposes, because certainly Python will "properly" encode to the " – alexis Jun 19 '22 at 17:37
  • Let's assume that the integer that I'm trying to encode is 11. I want the binary to be `000000001011` and not `101100000000`. Using `utf-8` to decode works but it prints `101100000000` instead of `000000001011`. This is the reason why I'm trying to validate. – user1241241 Jun 19 '22 at 17:42
  • You are misusing UTF encodings, I don't think you understand what they do. In fact you are misusing most steps, e.g. the argument of `int()` must be a string of digits -- you pass it two chinese characters. – alexis Jun 19 '22 at 17:44
  • I'm not passing Chinese characters. I'm passing the packed data. – user1241241 Jun 19 '22 at 17:46
  • If you want to check the encoding, just inspect the least significant bit with `var & 1`. – alexis Jun 19 '22 at 17:46
  • What do you think `bin_asc.decode("utf-16-le")` does? It turns your data into characters. Print it, you'll see. – alexis Jun 19 '22 at 17:47
  • please explain what do you mean by `var & 1`? how do I extract the least significant bit? – user1241241 Jun 19 '22 at 17:49
  • `&` in the Python (and C) bitwise "and" operator. A little later when I have time I'll write up an answer. – alexis Jun 19 '22 at 17:53
  • thank you!.. or is there a better way to print the binary value after `struct.pack`? – user1241241 Jun 19 '22 at 17:54
  • `bin(data_packed[0]); bin(data_packed[1])` (They are 8-bit ints) – alexis Jun 19 '22 at 17:55
  • But don't be confused by what you'll see; i'll explain later. – alexis Jun 19 '22 at 17:56
  • thank you! I'm so interested to know how it worked! and the mistake that i was committing. – user1241241 Jun 19 '22 at 17:57

2 Answers2

1

Use unpack to reverse what you packed. The data isn't UTF-encoded so there is no reason to use UTF encodings.

>>> import struct
>>> data_packed = struct.pack('<H', 40)
>>> data_packed.hex()   # the two little-endian bytes are 0x28 (40) and 0x00 (0)
2800
>>> data = struct.unpack('<H',data_packed)
>>> data
(40,)

unpack returns a tuple, so index it to get the single value

>>> data = struct.unpack('<H',data_packed)[0]
>>> data
40

To print in binary use string formatting. Either of these work work best. bin() doesn't let you specify the number of binary digits to display and the 0b needs to be removed if not desired.

>>> format(data,'016b')
'0000000000101000'
>>> f'{data:016b}'
'0000000000101000'
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
1

You have not said what you are trying to do, so let's assume your goal is to educate yourself. (If you are trying to pack data that will be passed to another program, the only reliable test is to check if the program reads your output correctly.)

Python does not have an "unsigned short" type, so the output of struct.pack() is a byte array. To see what's in it, just print it:

>>> data_packed = struct.pack('<H', 40)
>>> print(data_packed)
b'(\x00'

What's that? Well, the character (, which is decimal 40 in the ascii table, followed by a null byte. If you had used a number that does not map to a printable ascii character, you'd see something less surprising:

>>> struct.pack("<H", 11)
b'\x0b\x00'

Where 0b is 11 in hex, of course. Wait, I specified "little-endian", so why is my number on the left? The answer is, it's not. Python prints the byte string left to right because that's how English is written, but that's irrelevant. If it helps, think of strings as growing upwards: From low memory locations to high memory. The least significant byte comes first, which makes this little-endian.

Anyway, you can also look at the bytes directly:

>>> print(data_packed[0])
40

Yup, it's still there. But what about the bits, you say? For this, use bin() on each of the bytes separately:

>>> bin(data_packed[0])
'0b101000'
>>> bin(data_packed[1])
'0b0'

The two high bits you see are worth 32 and 8. Your number was less than 256, so it fits entirely in the low byte of the short you constructed.

What's wrong with your unpacking code?

Just for fun let's see what your sequence of transformations in get_bin_str was doing.

>>> binascii.hexlify(data_packed)
b'2800'

Um, all right. Not sure why you converted to hex digits, but now you have 4 bytes, not two. (28 is the number 40 written in hex, the 00 is for the null byte.) In the next step, you call decode and tell it that these 4 bytes are actually UTF-16; there's just enough for two unicode characters, let's take a look:

>>> b'2800'.decode("utf-16-le")
'㠲〰'

In the next step Python finally notices that something is wrong, but by then it does not make much difference because you are pretty far away from the number 40 you started with.

To correctly read your data as a UTF-16 character, call decode directly on the byte string you packed.

>>> data_packed.decode("utf-16-le")
'('
>>> ord('(')
40
alexis
  • 48,685
  • 16
  • 101
  • 161