My use-case it to catch all the exceptions that occur in python multiprocessing processes. So with the help of some web search, I came up with the following code:
import multiprocessing as mp
class Process(mp.Process):
def __init__(self, *args, **kwargs):
mp.Process.__init__(self, *args, **kwargs)
self._pconn, self._cconn = mp.Pipe()
self._exception = None
def run(self):
try:
mp.Process.run(self)
self._cconn.send(None)
except Exception as e:
tb = traceback.format_exc()
self._cconn.send((e, tb))
@property
def exception(self):
if self._pconn.poll():
self._exception = self._pconn.recv()
return self._exception
And then in my main code, I catch the errors as follows:
for p in jobs:
p.join()
if p.exception:
error, traceback = p.exception
print(traceback)
This works pretty well for light jobs (say 1000 finishing in 10 mins). But when the jobs are heavy (1000 finishing in 3 hrs), the subsequent runs are slow. There are two things I have observed which I am seeking a solution to:
1) while the code is running there are a lot of processes created (when I check top on the command line), most of them do nothing and do not slow down the runtime. Is it a problem I am ignoring?
2) When I run a heavy job, all the subsequent heavy jobs are delayed. This seems to be because there are some remaining processes that remained from the old heavy job and keep sharing the load with the new run.
Does the code give any hint why my runs are not fool-proof and show variable run times?
Please feel free to ask specific details and I will provide them, I am not sure what all might be needed so I am starting with a generalized query.
EDIT 1: Here is how I create the processes
jobs = []
for i in range(num_processes):
p = Process(target=func, args=(inqueue, outqueue))
jobs.append(p)
p.start()