When using the multiprocessing library, how do I bind resources to specific processes?

Question

Say I have 50 processes, and I'm using them to operate on (say) 20000 different input values. (I'm using the pathos library, which I think operates similarly to the multiprocessing library in Python.)

thread_pool = pathos.multiprocessing.ProcessingPool(threads=50)
thread_pool.map(function, inputs)

I want to create one SQLAlchemy database engine for each process (but I don't have the resources to create one for each input value). Then I want all inputs that are processed using that process to work with the same database engine.

How can I do this?

You could batch your input and map each batch to each thread, open a connection, and then have an iterator in each thread that calls the function on each individual thread using the same connection. — saq7, Jun 11 '16 at 19:43
Is pathos a requirement of yours? The Python standard `mutiprocessing.Pool` allows to do so easily. I looked at pathos implementation (could not find the Pool documentation) and it seems it's not 1:1 with the standard one. — noxdafox, Jun 12 '16 at 08:18
@noxdafox Yeah I figured out how to do it using the "multiprocess" library (you can use initializers) and moved away from pathos. Thanks! — Jessica, Jun 13 '16 at 22:39

score 1 · Accepted Answer · answered Jul 09 '16 at 18:27

I'm the author of both pathos and multiprocess. It turns out that multiprocess is actually what pathos is using, but maybe it's not obvious that is the case. You can do it from pathos:

>>> import pathos
>>> pathos.pools._ProcessPool 
<class 'multiprocess.pool.Pool'>

The above is the raw Pool directly from multiprocess, while pathos.pools.ProcessPool is a higher-level wrapper with some additional features, but does not (yet) expose all the keyword arguments from the lower-level Pool.

score 0 · Answer 2 · answered Jun 14 '16 at 18:54

I figured out how to do this by using the multiprocess library instead of the pathos library. When creating a pool of processes, you can specify an "initializer function", which runs at the beginning of each process. In this initializer function I created a database engine and declared that engine as global to the process. So now I have exactly one engine per process.

When using the multiprocessing library, how do I bind resources to specific processes?

2 Answers2