PySwig is a C++ wrapper which creates objects which (for unavoidable reasons) cannot be pickled by Python. I want to access a method on that object to run across a large dataset (around 1M entries) in parallel.
I can run a function with multiple cores when I load the object at a top level, so there seems to be no issue doing this in principle:
pyswig_obj = make_object()
def call(x):
return pyswig_obj.method(x)
p = Pool(8)
results = p.map(call, xs)
However if I wrap this in a function the default python pickle
cannot pickle the call()
function (only top-level functions can be pickled), which means I can't include this in a library. I've tried using dill
(via pathos
) to bypass this, but this results in trying to pickle the PySwig object itself, which doesn't work.
In principle, one workaround would be to create the object once in each process rather than sharing it, since it's fairly lightweight, but I'm not sure how this is possible in Python.