How to efficiently share dicts and lists between processes using ProcessPool

Question

Let's consider the following example:

from pathos.pools import ProcessPool

class A:
    def run(self, arg: int):

        my_list = list(...)
        my_dict = dict(...)

        def __run_parallel(arg: int):
            local_variable = 42

            # some code and read access...

            read_only1 = my_list[...]
            read_only2 = dict[...]


            # some code and write access...

            my_list.append(arg)
            my_dict[arg] = local_variable

        ProcessPool(4).map(__run_parallel, range(1000))

Since it seems as if list nor dict is thread-safe, I'm searching for a way to efficiently share access to these variables to all processes in the pool.

So far, I've tried to pass my_list and my_dict as additional arguments to __run_parallel using pa.helpers.mp.Manager. However, even though it works, it's horrendously slow (as it's obviously built for distributed systems).

Since I'm working on this in a trial and error session now for multiple evenings, I'd like to ask whether somebody knows how to efficiently use a shared dict and list within __run_parallel using pathos.

Could you use an `pathos.helpers.mp.Array` or work with the `numpy` `ctypes` interface more directly? You'd have to work with shared memory arrays, not lists and dicts. Arrays are intended for shared memory use, lists and dicts are not. The efficiency also depends on what you have in the list/dict. — Mike McKerns, Jul 10 '20 at 11:24

score 0 · Accepted Answer · answered Jul 11 '20 at 09:53

0

Converting both list and dict variables to pathos.helpers.mp.Array without an intermediate pa.helpers.mp.Manager as suggested by @Mike McKerns brought the desired performance boost.

answered Jul 11 '20 at 09:53

p4dn24x

445
4
14

How to efficiently share dicts and lists between processes using ProcessPool

1 Answers1