3

A code of mine does

from pathos.multiprocessing import ProcessingPool
def myFunc(something):
    thispool = ProcessingPool(nodes=Result.cores)
    listOfResults = thispool.map(something)
    return listOfResults

for i in range(1000):
    myFunc(i)

Now, in my actual more involved code, memory usage just kept growing. The code should take nothing, but if I run it with 12 cores, these 12 cores will initially take almost 1mb memory, but over the runtime of several hours, each of which will take several GB.

So, I thought that pool would leak memory, and that I better close it after each iteration:

def myFunc(something):
    thispool = ProcessingPool(nodes=Result.cores)
    listOfResults = thispool.map(something)
    thispool.close()
    thispool.join()
    return listOfResults

However, now, after several iterations, I get

ValueError: Pool not running

at the this pool.map() line. If I create a new

test = ProcessingPool(nodes=4)

and try to run test.map(), I get the same error. Which is weird, I have initialized a new variable... does pathos.processing.ProcessingPool have the feature of a unique process pool, and if I close one, I close all?

What's the correct way of implementing a pathos.multiprocessing.ProcessingPool inside a loop, without memory leakage?

When I instead use multiprocessing.Pool, the problem does not arise.

FooBar
  • 15,724
  • 19
  • 82
  • 171

1 Answers1

5

It turns out that indeed, through some backend magic, pathos prevents multiple instances of the same type of pool to be initialized.

To prevent leakage, one can do at the end of each iteration

thispool.terminate()
thispool.restart()
FooBar
  • 15,724
  • 19
  • 82
  • 171