Pickle Cython Class with C pointers

Question

I am trying to write a __reduce__() method for a cython class that contains C pointers but have so far found very little information on the best way to go about doing this. There are tons of examples around for how to properly write a __reduce__() method when using numpy arrays as member data. I'd like to stay away from Numpy arrays as they seem to always be stored as python objects and require calls to and from the python API. I come from a C background so I am very comfortable working with memory manually using calls to malloc() and free() and am trying to keep python interaction to an absolute minimum.

However I have run into a problem. I have a need to use something equivalent to copy.deepcopy() on the class I am creating, from the Python script where it will ultimately be used. I have found that the only good way to do this is to implement the pickle protocol for the class by implementing a __reduce__() method. This is trivial with most primitives or python objects. However I am at an absolute loss for how to go about doing this for dynamically allocated C arrays. Obviously I can't return the pointer itself as the underlying memory will have disappeared by the time the object is reconstructed, so what's the best way to do this? I'm sure this will require modification of both the __reduce__() method as well as one or both of the __init__() methods.

I have read the python documentation on pickling extension types found here as well as just about every other question of stack overflow about picking cython classes such as this question.

A condensed version of my class looks something like this:

cdef class Bin:
    cdef int* job_ids
    cdef int* jobs
    cdef int primitive_data

    def __cinit__(self):
        self.job_ids = <int*>malloc(40 * sizeof(int))
        self.jobs = <int*>malloc(40 * sizeof(int))

    def __init__(self, int val):
        self.primitive_data = val

    def __dealloc__(self):
        free(job_ids)
        free(jobs)

    def __reduce__(self):
        return (self.__class__, (self.primitive_data))

I've also read this question, but it doesn't directly apply to _pickling_ C pointers to arrays. [Cython - converting pointers to arrays into Python objects](http://stackoverflow.com/questions/5271690/cython-converting-pointers-to-arrays-into-python-objects?rq=1) — MS-DDOS, Mar 30 '16 at 06:44
I think you need to serialise the data into a Python `bytes` object. Then use a rebuild function (e.g. http://stackoverflow.com/a/12647497/1300519) to cast back to an int array. I haven't managed to get this working myself yet, but I believe this is the correct approach. Not posting this as an answer until I have a working example. — Snorfalorpagus, Mar 30 '16 at 11:00

score 8 · Accepted Answer · answered Mar 30 '16 at 12:54

One approach is to serialise the data in your array into a Python bytes array. The __reduce__ method first calls the get_data method which casts the data pointer to <char*> then to <bytes> (if you try to go there directly Cython doesn't know how to do it). __reduce__ returns this object, along with a reference to the rebuild function (a module-level function, not a method!) which can be use to recreate the instance using the set_data method. If you need to pass more than one array, as in your example, you just need to accept more arguments to rebuild and extend the tuple returned by __reduce__.

I haven't done much testing on this but it seems to work. It would probably explode if you passed it malformed data.

from cpython.mem cimport PyMem_Malloc, PyMem_Realloc, PyMem_Free
from libc.string cimport memcpy

cdef int length = 40

cdef class MyClass:
    cdef long *data

    def __cinit__(self):
        self.data = <long*>PyMem_Malloc(sizeof(long)*length)
        if not self.data:
            raise MemoryError()

    cdef bytes get_data(self):
        return <bytes>(<char *>self.data)[:sizeof(long)*length]

    cdef void set_data(self, bytes data):
        memcpy(self.data, <char*>data, sizeof(long)*length)

    def set_values(self):
        # assign some dummy data to the array 0..length
        for n in range(0, length):
            self.data[n] = n

    def get(self, i):
        # get the ith value of the data
        return self.data[i]

    def __reduce__(self):
        data = self.get_data()
        return (rebuild, (data,))

    def __dealloc__(self):
        PyMem_Free(self.data)

cpdef object rebuild(bytes data):
    c = MyClass()
    c.set_data(data)
    return c

Example usage (assuming MyClass is in hello.pyx):

import hello
import pickle

c1 = hello.MyClass()
c1.set_values()
print('c1', c1)
print('fifth item', c1.get(5))

d = pickle.dumps(c1)
del(c1)  # delete the original object

c2 = pickle.loads(d)
print('c2', c2)
print('fifth item', c2.get(5))

You might have issues if your data contains 0s (bytes might null terminate early)? But the idea looks good to me. — DavidW, Mar 30 '16 at 13:23
@DavidW I did wonder about this, but it doesn't seem to be an issue. `memcpy` doesn't consider null bytes like some of the other string functions (I think). I've tested it by setting the middle of the array in example to 0s and it seems OK. — Snorfalorpagus, Mar 30 '16 at 13:30
memcpy doesn't but I thought the bytes constructor might. If you've tested it then it's probably fine though! — DavidW, Mar 30 '16 at 13:41

Pickle Cython Class with C pointers

1 Answers1

Linked