5

I am currently trying to use python multiprocessing. The library I use is multiprocess (NOT multiprocessing).

I have the following code, which creates a number of computing jobs, and runs it through a map operation:

pool = multiprocess.Pool(4)
all_responses = pool.map_async(wrapper_singlerun, range(10000))
pool.join()
pool.close()

However, whenever I run this snippet of code, I get the following error:

    pool.join()
  File "/Users/davidal/miniconda3/lib/python3.6/site-packages/multiprocess/pool.py", line 509, in join
    assert self._state in (CLOSE, TERMINATE)
AssertionError

Do you have any idea why this error happens? I used pool.map_async before, but figured that I need to have a pool rendez-vous command. Otherwise, my PC created something like a forkbomb, which created too many threads (at least, that's what I think it does...)

Any ideas are appreciated!

DaveTheAl
  • 1,995
  • 4
  • 35
  • 65
  • Do you have that code inside a `if __name__ == '__main__':` block, or just at the top level of your module? – abarnert May 29 '18 at 22:00
  • This piece of code is encoded within a class, which is instantiated within a few layers of functions. The initial function is indeed within `if __name__ == "__main__":` – DaveTheAl May 29 '18 at 22:03
  • The [`multiprocess` homepage](https://github.com/uqfoundation/multiprocess) links to [docs that don't exist](http://multiprocess.readthedocs.io/). which makes it pretty hard to debug anything that isn't the same as `multiprocessing`. But it seems like the only differences are that (a) `multiprocess` pre-monkeypatches `dill` in place of `pickle` and (b) may be a few versions behind. So… can you repro the same problem with `multiprocessing`? – abarnert May 29 '18 at 22:10
  • Apparently I have the same problem with `multiprocessing` (when I replace the `pool = multiprocess.Pool(4)` with `pool = multiprocessing.Pool(4)` – DaveTheAl May 30 '18 at 11:05
  • @abarnert, more specifically, the `multiprocessing` module gives me a pickling error, that's why I switched to the other library (as it uses dill) – DaveTheAl May 30 '18 at 12:20

1 Answers1

14

The problem is that you're calling join before close.

multiprocess appears to be missing its documentation, but, as far as I can tell, it's basically a fork of the stdlib multiprocessing that pre-monkeypatches dill in for pickle, so the multiprocessing docs should be relevant here. (Also, in a comment, you said that you can repro the problem with multiprocessing.)

So, Pool.join says:

Wait for the worker processes to exit. One must call close() or terminate() before using join().

The close method is how you shut down the send side of the queue so new tasks can't be added. The join method is how you wait for everything on the queue to be processed. Waiting for the queue to drain before closing it wouldn't work.

But you're calling close after join, instead of before. And the first thing join does is assert that you've already called close or terminate, which you haven't, hence the assertion failure.

So, you probably just want to switch the order of those two calls.

Or, alternatively, maybe you were confused about what join is for, and thought you needed to call it before you could use all_responses.get() or .wait(). If so—you don't need to do that; the get will block until the results are available, after which you don't need a join. This is actually more common, especially with map and friends (although the examples in the docs do it via a with Pool(…) as pool: instead of manually calling anything on the pool).

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • @abamert, this is very helpful. If this chunk of code is in a loop, since the pool is closed after every loop, can I 'reopen' the pool at the beginning of the next loop or do I have to do a new `pool = multiprocessing.Pool(4)` in every loop? – Indominus Dec 24 '18 at 08:10