How do I pack bits from one byte array to another efficiently in python3?

Question

I have a fairly large byte array in python. In the simplest situation the byte array only contains 0 or 1 values (0x00, 0x01), also the array is always a multiple of 8 in length. How can I pack these "bits" into another byte array (it doesn't need to be mutable) so the source index zero goes to the MSB of the first output byte etc.

For example if src = bytearray([1,0,0,0,1,0,0,1, 1,1,1,0,0,0,1,0, 1,1,1,1,1,1,1,1]) Desired output would be b'\x89\xe2\xff'.

I could do it with a for loop and bit shifting and or-ing and concatenation, but there surely is a faster/better built-in way to do this.

In a follow up question, I also might want to have the source byte array contain values from the set 0-3 and pack these 4 at a time into the output array. Is there a way of doing that?

In general is there a way of interpreting elements of a list as true or false and packing them 8 at a time into a byte array?

Can you use NumPy, or a third-party bit-twiddling library, or does it have to be builtins/stdlib only? — abarnert, Jul 17 '18 at 04:42
Also, are you more concerned with performance, readability, or size of the code here? You could make this really concise (without third-party libs) with just a `ctypes.Structure` with 8 bit-fields, which you can construct from `bits(*src[:8])`, etc. and then cast or `Union` that with a `uint8`, but I'm not sure how understandable that is to someone who doesn't think in C, or how fast it is. — abarnert, Jul 17 '18 at 04:48
I'd rather not use numpy at the moment if possible. Efficiency is more important than readability. I am fluent in C, but I don't know anything about python ctypes. — Robotbugs, Jul 17 '18 at 04:49
Unless you have a *good* reason, "always use numpy" is a good rule. — o11c, Jul 17 '18 at 05:29

abarnert · Answer 1 · 2018-07-17T05:47:10.330

As ridiculous as it may sound, the fastest solution using builtins may be to build a string and pass it to int, much as the fastest way to count 1-bits in an int is bin(n).count('1'). And it's dead simple, too:

def unbitify_byte(src):
    s = ''.join(map(str, src))
    n = int(s, 2)
    return n.to_bytes(len(src)//8, 'big')

Equivalent (but marginally more complex) code using gmpy2 instead of native Python int is a bit faster.

And you can extend it to 2-bit values pretty easily:

def unhalfnybblify_byte(src):
    s = ''.join(map(str, src))
    n = int(s, 4)
    return n.to_bytes(len(src)//4, 'big')

If you want something more flexible, but possibly slower, here's a simple solution using ctypes.

If you know C, you can probably see a struct of 8 single-bit bit-fields would come in handy here. And you can write the equivalent struct type in Python like this:

class Bits(ctypes.Structure):
    _fields_ = [(f'bit{8-i}', ctypes.c_uint, 1) for i in range(8)]

And you can construct one of them from 8 ints that are all 0 or 1:

bits = Bits(*src[:8])

And you can convert that to a single int by using an ugly cast or a simple union:

class UBits(ctypes.Union):
    _fields_ = [('bits', Bits), ('i', ctypes.c_uint8)]

i = UBits(Bits(*src[:8])).i

So now it's just a matter of chunking src into groups of 8 in big-endian order:

chunks = (src[i:i+8][::-1] for i in range(0, len(src), 8))
dst = bytearray(UBits(Bits(*chunk)).i for chunk in chunks)

And it should be pretty obvious how to extend this to four 2-bit fields, or two 4-bit fields, or even two 3-bit fields and a 2-bit field, per byte.

However, despite looking like low-level C code, it's probably slower. Still, it might be worth testing to see if it's fast enough for your uses.

A custom C extension can probably do better. And there are a number of bit-array-type modules on PyPI to try out. But if you want to go down that road, numpy is the obvious answer. You can't get any simpler than this:

np.packbits(src)

(A bytearray works just fine as an "array-like".)

It's also hard to beat for speed.

For comparison, here's some measurements:

60ns/byte + 0.3µs: np.packbits on an array instead of a bytearray
60ns/byte + 1.9µs: np.packbits
440ns/byte + 3.2µs: for and bit-twiddling in PyPy instead of CPython
570µs/byte + 3.8µs: int(…, 2).to_bytes(…) in PyPy instead of CPython
610ns/byte + 9.1µs: bitarray
800ns/byte + 2.9µs: gmpy.mpz(…)…
1.0µs/byte + 2.8µs: int(…, 2).to_bytes(…)
2.9µs/byte + 0.2µs: (UBits(Bits(*chunk)) …)
16.µs/byte + 0.9µs: for and bit-twiddling

score 0 · Answer 2 · answered Jul 17 '18 at 05:34

Using numpy, with test code and comments:

#!/usr/bin/env python3
import numpy as np


def pack_bits(a):
    # big-endian - use '<u8' if you want little-endian
    #0000000A0000000B0000000C0000000D0000000E0000000F0000000G0000000H
    b = np.copy(a.view('>u8'))
    #0000000A000000AB000000BC000000CD000000DE000000EF000000FG000000GH
    b |= b >> 7
    #0000000A000000AB00000ABC0000ABCD0000BCDE0000CDEF0000DEFG0000EFGH
    b |= b >> 14
    #0000000A000000AB00000ABC0000ABCD000ABCDE00ABCDEF0ABCDEFGABCDEFGH
    b |= b >> 28
    return np.array(b, dtype='u1')

def main():
    a = []
    for i in range(256):
        # build 8-bit lists without numpy, then convert
        a.append(np.array([int(b) for b in bin(256 + i)[2+1:]], dtype='u1'))
    a = np.array(a)
    print(a)
    b = pack_bits(a)
    print(b)

if __name__ == '__main__':
    main()

Similar code exists for other deinterleaving, bit since the number of bits between inputs is less than the number of bytes in a word, we can avoid the masking here (note that the 0ABCDEFG does not overlap the ABCDEFGH).

How do I pack bits from one byte array to another efficiently in python3?

2 Answers2