12

I have been implementing python's multiprocessing library using a pool of workers. I implemented the following code

import main1
t1 = time.time()
p = Pool(cores) 
result = p.map(main1, client_list[client])
if result == []:
    return []
p.close()
p.join()
print "Time taken in performing request:: ", time.time()-t1
return shorted(result)

However, after running the process for a while, I get lot of running background processes of my app. Here is a snapshot after doing ps aux for my app

Snapshot showing all the zombie processes

Now, I have read a lot of similar questions on stackoverflow like how to kill zombie processes created by multiprocessing module? which calls for using .join() which I have already implemented and I learned how to kill all these processes from here Python Multiprocessing Kill Processes. But I want to know what possibly could go wrong with my code. I won't able to share all of my code in the main1 function but I have put the entire code block in try catch block to avoid cases where an error in the main code could lead to zombie processes.

def main1((param1, param2, param3)):
    try:
       resout.append(some_data) //resout in case of no error
    except:
        print traceback.format_exc()
        resout = []  //sending empty resout in case of error
    return resout

I'm still very new to the concept of parallel programming and debugging issues with it is turning out to be tricky.Any help will be greatly appreciated.

Community
  • 1
  • 1
ankits
  • 305
  • 1
  • 3
  • 13
  • Unfortunately the code you posted does not help much in diagnosing the problem. Too many unexplained variables and, more importantly, it's not clear from what you posted how the code is getting called, and what's happening after your function returns. My initial impression is that you are creating many pools in a loop, instead of reusing a pool many times. But I can't really be sure. – justinpawela Jun 03 '15 at 18:42
  • [You should structure your code like this.](http://pymotw.com/2/multiprocessing/communication.html#process-pools) If you have lots of work to do, you should use the same pool over and over (you should only ever call `Pool()` once). When you are finally done with the worker processes, calling `close` and `join` are important - they are what signals the processes to terminate; they are not just for aborting zombies. In your first code block above, if `results` is empty, you never terminate the workers, you just `return` to whatever code was the caller. – justinpawela Jun 03 '15 at 18:51

1 Answers1

21

Usually the most common problem is that the pool is created but it is not closed.

The best way I know to guarantee that the pool is closed is to use a try/finally clause:

try:
    pool = Pool(ncores)
    pool.map(yourfunction, arguments)
finally:
    pool.close()
    pool.join()

If you don't want to struggle with multiprocessing, I wrote a simple package named parmap that wraps multiprocessing to make my life (and potentially yours) easier.

pip install parmap

import parmap
parmap.map(yourfunction, arguments)

From the parmap usage section:

  • Simple parallel example:

    import parmap
    y1 = [myfunction(x, argument1, argument2) for x in mylist]
    y2 = parmap.map(myfunction, mylist, argument1, argument2)
    y1 == y2
    
  • Iterating over a list of tuples:

    # You want to do:
    z = [myfunction(x, y, argument1, argument2) for (x,y) in mylist]
    z = parmap.starmap(myfunction, mylist, argument1, argument2)
    
    
    # You want to do:
    listx = [1, 2, 3, 4, 5, 6]
    listy = [2, 3, 4, 5, 6, 7]
    param = 3.14
    param2 = 42
    listz = []
    for (x, y) in zip(listx, listy):
        listz.append(myfunction(x, y, param1, param2))
    # In parallel:
    listz = parmap.starmap(myfunction, zip(listx, listy), param1, param2)
    
zeehio
  • 4,023
  • 2
  • 34
  • 48
  • This is actually so convenient for developers who do not want to deal with the mess associated with multi processing. – ankits Sep 10 '15 at 14:14
  • I don't think this answers the question. I'm running into the same issue but I only ever open one Pool, but my iterator is very long. Even setting maxtasksperchild=10 still results in zombie processes. Shouldn't those processes be killed after running 10 tasks? I'm using the context handler `with` so the closing is done correctly. – Samuel Prevost Jul 08 '21 at 07:00
  • @SamuelPrevost, In my answer I was just giving the most common cause for zombie processes. There may be others. Why are you having zombie processes? It's hard to know without seeing a reproducible example, maybe you should open a new question. – zeehio Jul 12 '21 at 07:09
  • @zeehio thanks for helping me. Fortunately I found my mistake: I was using `htop` to monitor the processes, but by default it not only displays processes but also threads, hence way many more "processes", mostly inactive. After tweaking the display options of `htop`, I disabled user-land threads and now everything is good. I noticed because `ps -ax` wasn't showing me those "zombie processes" – Samuel Prevost Jul 12 '21 at 11:05