I'm the pathos
developer. It's not an oversight... you can't use chunksize
when using pathos.pools.ProcessingPool
. The reason this was done, was that I wanted to have the map
functions have the same interface as python's map
... and to do that, based on the multiprocessing
implementation, I either had to choose to make chunksize
a keyword, or to allow *args
and **kwds
. So I choose the latter.
If you want to use chunksize
, there is _ProcessPool
, which retains the original multiprocessing.Pool
interface, but has augmented serialization.
>>> import pathos
>>> p = pathos.pools._ProcessPool()
>>> p.map(lambda x:x*x, range(4), chunksize=10)
[0, 1, 4, 9]
>>>
I'm sorry you feel the documentation is lacking. The code is primarily composed of a fork of multiprocessing
from the python standard library... and I didn't change the documentation where the functionality has been reproduced. For example, here I am recycling the STL docs, as the functionality is the same:
>>> p = pathos.pools._ProcessPool()
>>> print(p.map.__doc__)
Equivalent of `map()` builtin
>>> p = multiprocessing.Pool()
>>> print(p.map.__doc__)
Equivalent of `map()` builtin
>>>
... and in the cases where I have modified functionality, I did write new docs:
>>> p = pathos.pools.ProcessPool()
>>> print(p.map.__doc__)
run a batch of jobs with a blocking and ordered map
Returns a list of results of applying the function f to the items of
the argument sequence(s). If more than one sequence is given, the
function is called with an argument list consisting of the corresponding
item of each sequence.
>>>
Admittedly, the docs could be better. Especially the docs coming from the STL could be improved upon. Please feel free to add a ticket on GitHub, or even better, a PR to extend the docs.