This is not a complete answer, but the source can help guide us. When you pass maxtasksperchild
to Pool
it saves this value as self._maxtasksperchild
and only uses it in the creation of a worker
object:
def _repopulate_pool(self):
"""Bring the number of pool processes up to the specified number,
for use after reaping workers which have exited.
"""
for i in range(self._processes - len(self._pool)):
w = self.Process(target=worker,
args=(self._inqueue, self._outqueue,
self._initializer,
self._initargs, self._maxtasksperchild)
)
...
This worker object uses maxtasksperchild
like so:
assert maxtasks is None or (type(maxtasks) == int and maxtasks > 0)
which wouldn't change the physical limit, and
while maxtasks is None or (maxtasks and completed < maxtasks):
try:
task = get()
except (EOFError, IOError):
debug('worker got EOFError or IOError -- exiting')
break
...
put((job, i, result))
completed += 1
essentially saving the results from each task. While you could run into memory issues by saving too many results, you can achieve the same error by making a list too large in the first place. In short, the source does not suggest a limit to the number of tasks possible as long as the results can fit in memory once released.
Does this answer the question? Not entirely. However, on Ubuntu 12.04 with Python 2.7.5 this code, while inadvisable seems to run just fine for any large max_task value. Be warned that the output seems to take exponentially longer to run for large values:
import multiprocessing, time
max_tasks = 10**3
def f(x):
print x**2
time.sleep(5)
return x**2
P = multiprocessing.Pool(max_tasks)
for x in xrange(max_tasks):
P.apply_async(f,args=(x,))
P.close()
P.join()