2

I need to do some stuffs in multiprocess with Python 3.6. Namely, I have to update a dict adding lists of objects. Since these objects are unpickable I need to use dill instead of pickle and multiprocess from pathos instead of multiprocessing, but this should not be the problem.

Adding a list to the dictionary needs to reserialize the list before of adding to the dictionary. This slow down everything and it takes the same time as without multiprocessing. Could you suggest me a workaround?

This is my code with python 3.6: init1 is working but slow, init2 is fast but broken. The remaining is only for test purpose.

import time

def init1(d: dict):
    for i in range(1000):
        l = []
        for k in range(i):
             l.append(k)
        d[i] = l

def init2(d: dict):
    for i in range(1000):
        l = []
        d[i] = l
        for k in range(i):
            l.append(i)

def test1():
    import multiprocess as mp
    with mp.Manager() as manager:
        d = manager.dict()
        p = mp.Process(target=init1, args=(d,))
        p.start()
        p.join()
        print(d)

def test2():
    import multiprocess as mp
    with mp.Manager() as manager:
        d = manager.dict()
        p = mp.Process(target=init2, args=(d,))
        p.start()
        p.join()
        print(d)

start = time.time()
test1()
end = time.time()
print('test1: ', end - start)


start = time.time()
test2()
end = time.time()
print('test2: ', end - start)
fortea
  • 345
  • 4
  • 15
  • Might be unrelated, but I think you're probably risking hitting a bug by having nested loops both use `i` - and `d[i]` is unclear here. – match Feb 10 '18 at 11:35
  • @match This is only for testing, change it if you want :) – fortea Feb 10 '18 at 11:37
  • `init2` also references `l` which isn't defined in that method. Please fix up this code to actually be what you are testing, and be clear about what you want in `init1` – match Feb 10 '18 at 11:42
  • See the subtle difference in loops, and the major difference in results, here: https://eval.in/953752 – match Feb 10 '18 at 11:57
  • fixed, sorry for huge output, it is needed to highlight timing differences – fortea Feb 10 '18 at 12:13
  • For speed testing, you might want to use the `timeit` module - this runs a set piece of code thousands of times and reports on the time it took. – match Feb 10 '18 at 12:17
  • `timeit` example here to get you started: https://pastebin.com/DtyDeiJK – match Feb 10 '18 at 12:23
  • using jupyter `%%timeit` I get: *1.14 s ± 34.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)* for `test1()` and *203 ms ± 2.33 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)* for `test2()`. Your code instead output this: `1.4....` for both the tests... – fortea Feb 10 '18 at 13:45
  • Here is a possible explanation: https://stackoverflow.com/questions/8619167/inconsistency-between-time-and-timeit-in-ipython – fortea Feb 10 '18 at 13:52

1 Answers1

0

Possible solution using pipes. On my pc this takes 870ms, compared to 1.10s of test1 and 200ms of test2.

def init3(child_conn):
    d = {}
    for i in range(1000):
        l = []
        for k in range(i):
            l.append(i)
        d[i] = l
    child_conn.send(d)

def test3():
    import multiprocess as mp
    parent_conn, child_conn = mp.Pipe(duplex=False)
    p = mp.Process(target=init3, args=(child_conn,))
    p.start()
    d = parent_conn.recv()
    p.join()

On jupyter, by using magic %timeit I get:

In [01]: %timeit test3()
872 ms ± 11.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [02]: %timeit test2()
199 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [03]: %timeit test1()
1.09 s ± 10.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
fortea
  • 345
  • 4
  • 15