Pathos.multiprocessing's Pool appears to be nonlocal?

Question

A code of mine does

from pathos.multiprocessing import ProcessingPool
def myFunc(something):
    thispool = ProcessingPool(nodes=Result.cores)
    listOfResults = thispool.map(something)
    return listOfResults

for i in range(1000):
    myFunc(i)

Now, in my actual more involved code, memory usage just kept growing. The code should take nothing, but if I run it with 12 cores, these 12 cores will initially take almost 1mb memory, but over the runtime of several hours, each of which will take several GB.

So, I thought that pool would leak memory, and that I better close it after each iteration:

def myFunc(something):
    thispool = ProcessingPool(nodes=Result.cores)
    listOfResults = thispool.map(something)
    thispool.close()
    thispool.join()
    return listOfResults

However, now, after several iterations, I get

ValueError: Pool not running

at the this pool.map() line. If I create a new

test = ProcessingPool(nodes=4)

and try to run test.map(), I get the same error. Which is weird, I have initialized a new variable... does pathos.processing.ProcessingPool have the feature of a unique process pool, and if I close one, I close all?

What's the correct way of implementing a pathos.multiprocessing.ProcessingPool inside a loop, without memory leakage?

When I instead use multiprocessing.Pool, the problem does not arise.

score 5 · Answer 1 · answered Apr 18 '18 at 14:06

5

It turns out that indeed, through some backend magic, pathos prevents multiple instances of the same type of pool to be initialized.

To prevent leakage, one can do at the end of each iteration

thispool.terminate()
thispool.restart()

answered Apr 18 '18 at 14:06

FooBar

15,724
19
82
171

3

I'm the `pathos` author. This is definitely the case. – Mike McKerns May 31 '18 at 18:27

Pathos.multiprocessing's Pool appears to be nonlocal?

1 Answers1