catching exceptions in python3 multiprocessing

Question

My use-case it to catch all the exceptions that occur in python multiprocessing processes. So with the help of some web search, I came up with the following code:

import multiprocessing as mp

class Process(mp.Process):
    def __init__(self, *args, **kwargs):
        mp.Process.__init__(self, *args, **kwargs)
        self._pconn, self._cconn = mp.Pipe()
        self._exception = None

    def run(self):
        try:
            mp.Process.run(self)
            self._cconn.send(None)
        except Exception as e:
            tb = traceback.format_exc()
            self._cconn.send((e, tb))

    @property
    def exception(self):
        if self._pconn.poll():
            self._exception = self._pconn.recv()
        return self._exception

And then in my main code, I catch the errors as follows:

for p in jobs:
    p.join()
    if p.exception:
        error, traceback = p.exception
        print(traceback)

This works pretty well for light jobs (say 1000 finishing in 10 mins). But when the jobs are heavy (1000 finishing in 3 hrs), the subsequent runs are slow. There are two things I have observed which I am seeking a solution to:

1) while the code is running there are a lot of processes created (when I check top on the command line), most of them do nothing and do not slow down the runtime. Is it a problem I am ignoring?

2) When I run a heavy job, all the subsequent heavy jobs are delayed. This seems to be because there are some remaining processes that remained from the old heavy job and keep sharing the load with the new run.

Does the code give any hint why my runs are not fool-proof and show variable run times?

Please feel free to ask specific details and I will provide them, I am not sure what all might be needed so I am starting with a generalized query.

EDIT 1: Here is how I create the processes

jobs = []
for i in range(num_processes):
    p = Process(target=func, args=(inqueue, outqueue))
    jobs.append(p)
    p.start()

This sounds like you have zombie processes if they're hanging around after the job completes. Also, stating that you see "many processes created" makes it seem as though you get more than you expected, yet you have explicit control of how many `mp.Process` are started. — roganjosh, Apr 27 '17 at 11:25
Do you see anything in the code that might be creating the zombie process? - that is exactly what I want to know. Or How do I get rid of zombie processes? I am creating N process, but I see much more than N, most of whom do nothing. I am adding the code in the main question of how i am creating them. — manav, Apr 27 '17 at 11:35
That's less easy to answer. One route for a zombie process is that pipes/queue buffers can be overwhelmed when you try to add data (even a queue with an infinite size has an upper limit on how much data can be put into the queue in one go). — roganjosh, Apr 27 '17 at 11:38
I see. I do have a queue that stores the data created by various processes and pass it to a unique process which then writes them. This can total upto 2-10 GB. Let's say this is unavoidable. What should I do after I have run the heavy job to make sure the zombies are dead? — manav, Apr 27 '17 at 11:42
Well, there are several approaches you can search on google e.g. [this](http://stackoverflow.com/questions/19322129/how-to-kill-zombie-processes-created-by-multiprocessing-module) to find one that suits. However, I think that's back-to-front. You should address what creates the zombies in the first place. Like I said, the queue can hold infinite data, but you cannot `put` infinite data in a queue in a single transaction. So can you not rework how you place data in the queue? That is, ofc, assuming this is the actual issue. — roganjosh, Apr 27 '17 at 11:51

catching exceptions in python3 multiprocessing

0 Answers0