You are spawning 1000 processes, so that means 1000 python instances. For a small job like yours, it's going to slow you down by quite a lot. You don't want pp
for this. Worse still, if your ppservers
are over the network (instead of local processes), then you not only have the overhead of making a socket connection, but you also have the overhead of sending code across the network to spawn a python instance on another computer(s). If you don't want to use the socket and internet connection, you can force pp
to only work locally by setting ppservers=()
, (which it seems that you are already doing). pp
also has to serialize your code, and send it across processes, and then reconstitute the code object in the other process -- which can also slow things down. I wouldn't expect 10 minutes, unless you are going across sockets, or that you are pegging your memory with the spawned python instances.
I'd recommend using threads instead of pp
in this case, -- so the multiprocessing
library, since your function seems like it might be small.
IF you want a library that provides a nice abstraction to pp
and multiprocessing
so you can pick and choose which to deploy for a particular job without otherwise changing your code, you can try pathos
. pathos
also provides defaults and tweaks for pp
that help speed it up. Then you can test out which way it's the fastest to run your function, and go with that.
>>> import pathos.pp as pp
>>> import pathos.multiprocessing as mp
>>>
>>> def squared(x):
... return x**2
...
>>> pppool = pp.ParallelPythonPool()
>>> mppool = mp.ProcessingPool()
>>>
>>> res = pppool.amap(squared, xrange(1000))
>>> sqd = mppool.map(squared, xrange(1000))
>>> sqd[:10], sqd[-10:]
([0, 1, 4, 9, 16, 25, 36, 49, 64, 81], [980100, 982081, 984064, 986049, 988036, 990025, 992016, 994009, 996004, 998001])
>>>
>>> sq = res.get()
>>> sq[:10], sq[-10:]
([0, 1, 4, 9, 16, 25, 36, 49, 64, 81], [980100, 982081, 984064, 986049, 988036, 990025, 992016, 994009, 996004, 998001])
>>>
>>> thpool = mp.ThreadingPool()
>>> s = thpool.imap(squared, xrange(1000))
>>> s = list(s)
>>> s[:10], s[-10:]
([0, 1, 4, 9, 16, 25, 36, 49, 64, 81], [980100, 982081, 984064, 986049, 988036, 990025, 992016, 994009, 996004, 998001])
Above, I'm doing multiprocessing
with a blocking map, while doing pp
with a asynchronous (nonblocking) map. Then afterward, I do an iterator map with threads (leveraging multiprocessing
). BTW, pathos
also provides connectivity to MPI
and cluster schedulers as well (not shown above).
Get pathos
here: https://github.com/uqfoundation