0

I used parallel python (pp) package in order to perform a serial parallel processing over my 4-core laptop. Here is a quick summary of the scripts used to do the job. After initializing the parallel python object, I split up my task over 4 jobs and retrieved them to a list.

import pp
ppservers = ()
job_server = pp.Server(ppservers = ppservers)

start = 1
end = 1000
parts = 4
step = (end-start)/parts + 1

jobs=[]
for i in xrange(parts):
   starti = pp_start + i * step - 1
   endi = min(pp_start + (i+1)*step - 1,pp_end)
jobs.append(job_server.submit(functionName,(arg1,arg2)))

results=[job() for job in jobs]

What I noticed was that the for-loop performance was fairly fast (within a few seconds) but the retrieving process (results = [job() for job in jobs]) took way too long (approximately 10 minutes).

Could someone explain why this is and recommend a way to get around this problem? Thank you.

timrau
  • 22,578
  • 4
  • 51
  • 64
user4279562
  • 669
  • 12
  • 25

1 Answers1

0

You are spawning 1000 processes, so that means 1000 python instances. For a small job like yours, it's going to slow you down by quite a lot. You don't want pp for this. Worse still, if your ppservers are over the network (instead of local processes), then you not only have the overhead of making a socket connection, but you also have the overhead of sending code across the network to spawn a python instance on another computer(s). If you don't want to use the socket and internet connection, you can force pp to only work locally by setting ppservers=(), (which it seems that you are already doing). pp also has to serialize your code, and send it across processes, and then reconstitute the code object in the other process -- which can also slow things down. I wouldn't expect 10 minutes, unless you are going across sockets, or that you are pegging your memory with the spawned python instances.

I'd recommend using threads instead of pp in this case, -- so the multiprocessing library, since your function seems like it might be small.

IF you want a library that provides a nice abstraction to pp and multiprocessing so you can pick and choose which to deploy for a particular job without otherwise changing your code, you can try pathos. pathos also provides defaults and tweaks for pp that help speed it up. Then you can test out which way it's the fastest to run your function, and go with that.

>>> import pathos.pp as pp
>>> import pathos.multiprocessing as mp
>>>    
>>> def squared(x):
...   return x**2
... 
>>> pppool = pp.ParallelPythonPool()
>>> mppool = mp.ProcessingPool()
>>> 
>>> res = pppool.amap(squared, xrange(1000))
>>> sqd = mppool.map(squared, xrange(1000))
>>> sqd[:10], sqd[-10:]
([0, 1, 4, 9, 16, 25, 36, 49, 64, 81], [980100, 982081, 984064, 986049, 988036, 990025, 992016, 994009, 996004, 998001])
>>> 
>>> sq = res.get()
>>> sq[:10], sq[-10:]
([0, 1, 4, 9, 16, 25, 36, 49, 64, 81], [980100, 982081, 984064, 986049, 988036, 990025, 992016, 994009, 996004, 998001])
>>> 
>>> thpool = mp.ThreadingPool()
>>> s = thpool.imap(squared, xrange(1000))
>>> s = list(s)
>>> s[:10], s[-10:]
([0, 1, 4, 9, 16, 25, 36, 49, 64, 81], [980100, 982081, 984064, 986049, 988036, 990025, 992016, 994009, 996004, 998001])

Above, I'm doing multiprocessing with a blocking map, while doing pp with a asynchronous (nonblocking) map. Then afterward, I do an iterator map with threads (leveraging multiprocessing). BTW, pathos also provides connectivity to MPI and cluster schedulers as well (not shown above).

Get pathos here: https://github.com/uqfoundation

Mike McKerns
  • 33,715
  • 8
  • 119
  • 139
  • Thanks Mike for making the difference between pp and multiprocessing clear. I will do some testing how my code evolves in both approaches. – user4279562 Jan 23 '15 at 14:29