How do I create a Python bytes object in the C API

Question

I have a Numpy vector of bools and I'm trying to use the C API to get a bytes object as quickly as possible from it. (Ideally, I want to map the binary value of the vector to the bytes object.)

I can read in the vector successfully and I have the data in bool_vec_arr. I thought of creating an int and setting its bits in this way:

PyBytesObject * pbo; 
int byte = 0;
int i = 0;
while ( i < vec->dimensions[0] )  
{
    if ( bool_vec_arr[i] )
    {
        byte |= 1UL << i % 8;
    }
    i++;
    if (i % 8 == 0)
    {
        /* do something here? */
        byte = 0;
    }
}
return PyBuildValue("S", pbo);

But I'm not sure how to use the value of byte in pbo. Does anyone have any suggestions?

Side-note: `return PyBuildValue("S", pbo);` is pointless, and just guarantees a reference leak (it increments the reference count on `pbo` and returns it otherwise unchanged, but you had to pay the expense of parsing the format string to do it). You should just be doing `return pbo;` directly, or if you've stored off a copy of `pbo` somewhere so you can't give up your own reference, `Py_INCREF(pbo);` before `return pbo;` — ShadowRanger, Apr 27 '19 at 00:47

score 5 · Accepted Answer · answered Apr 27 '19 at 00:43

You need to store the byte you've just completed off. Your problem is you haven't made an actual bytes object to populate, so do that. You know how long the result must be (one-eighth the size of the bool vector, rounded up), so use PyBytes_FromStringAndSize to get a bytes object of the correct size, then populate it as you go.

You'd just allocate with:

// Preallocate enough bytes
PyBytesObject *pbo = PyBytes_FromStringAndSize(NULL, (vec->dimensions[0] + 7) / 8);
// Put check for NULL here

// Extract pointer to underlying buffer
char *bytebuffer = PyBytes_AsString(pbo);

where adding 7 then dividing by 8 rounds up to ensure you have enough bytes for all the bits, then assign to the appropriate index when you've finished a byte, e.g.:

if (i % 8 == 0)
{
    bytebuffer[i / 8 - 1] = byte;  // Store completed byte to next index
    byte = 0;
}

If the final byte might be incomplete, you'll need to decide how to handle this (do the pad bits appear on the left or right, is the final byte omitted and therefore you shouldn't round up the allocation, etc.).

How do I create a Python bytes object in the C API

1 Answers1