4

I have tried to put a for loop in parallel to speed up some code. consider this:

from multiprocessing import Pool

results = []

def do_stuff(str):
    print str
    results.append(str)

p = Pool(4)
p.map(do_stuff, ['str1','str2','str3',...]) # many strings here ~ 2000
p.close()

print results

I have some debug messages showing from do_stuff to keep track of how far the program gets before dying. It seems to die at different points each time through. For example it will print 'str297' and then it will just stop running, I will see all the CPUs stop working and the program just sits there. Should be some error occuring but there is no error message showing. Does anyone know how to debug this problem?

UPDATE

I tried re-working the code a little bit. Instead of using the map function I tried the apply_async function like this:

        pool = Pool(5)
        results = pool.map(do_sym, underlyings[0::10])
        results = []
        for sym in underlyings[0::10]:
            r = pool.apply_async(do_sym, [sym])
            results.append(r)

        pool.close()
        pool.join()

        for result in results:
            print result.get(timeout=1000)

This worked just as good as the map function, but ended up hanging in the same way. It would never get to the for loop where it prints the results.

After working on this a little more, and trying some debugging logging like was suggested in unutbu's answer, I will give some more info here. The problem is very strange. It seems like the pool is just hanging there and unable to close and continue the program. I use the PyDev environment for testing my programs, but I thought I would try just running python in the console. In the console I get the same behavior, but when I press control+C to kill the program, I get some output which might explain where the problem is:

> KeyboardInterrupt ^CProcess PoolWorker-47: Traceback (most recent call
> last):   File "/usr/lib/python2.7/multiprocessing/process.py", line
> 258, in _bootstrap Process PoolWorker-48: Traceback (most recent call
> last):   File "/usr/lib/python2.7/multiprocessing/process.py", line
> 258, in _bootstrap Process PoolWorker-45: Process PoolWorker-46:
> Process PoolWorker-44:
>     self.run()   File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
>     self._target(*self._args, **self._kwargs)   File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
> Traceback (most recent call last): Traceback (most recent call last):
> Traceback (most recent call last):   File
> "/usr/lib/python2.7/multiprocessing/process.py", line 258, in
> _bootstrap   File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap   File
> "/usr/lib/python2.7/multiprocessing/process.py", line 258, in
> _bootstrap
>     task = get()   File "/usr/lib/python2.7/multiprocessing/queues.py", line 374, in get
>     self.run()   File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
>     racquire()
>     self._target(*self._args, **self._kwargs)   File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
> KeyboardInterrupt
>     task = get()   File "/usr/lib/python2.7/multiprocessing/queues.py", line 374, in get
>     self.run()
>     self.run()   File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
>     self.run()   File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run   File
> "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
>     self._target(*self._args, **self._kwargs)   File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
>     self._target(*self._args, **self._kwargs)
>     self._target(*self._args, **self._kwargs)   File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
>     racquire()   File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker KeyboardInterrupt
>     task = get()   File "/usr/lib/python2.7/multiprocessing/queues.py", line 374, in get
>     task = get()
>     task = get()   File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get  
> File "/usr/lib/python2.7/multiprocessing/queues.py", line 374, in get
>     racquire()
>     return recv()
>     racquire() KeyboardInterrupt KeyboardInterrupt KeyboardInterrupt

Then actually the program never dies. I end up having to close the terminal window to kill it.

UPDATE 2

I narrowed down the problem inside the function that is running in the pool, and it was a MySQL database transaction that was causing the problem. I was using the MySQLdb package before. I switched it the a pandas.read_sql function for the transaction, and it is working now.

jeffery_the_wind
  • 17,048
  • 34
  • 98
  • 160
  • 2
    `results` will not be shared between processes. Also, each process will reimport the module and create a new `Pool` and `map` the function. You need to set them in a separate `__main__` block. See the [docs](https://docs.python.org/2/library/multiprocessing.html#windows) – Peter Wood Oct 23 '15 at 13:06
  • results doesn't really need to be shared, just as long as all the results get appended eventually. This should work shouldn't it? – jeffery_the_wind Oct 23 '15 at 13:08

1 Answers1

4

pool.map returns the results in a list. So instead of calling results.append in the concurrent processes (which will not work since each process will have its own independent copy of results), assign results to the value returned by pool.map in the main process:

import multiprocessing as mp

def do_stuff(text):
    return text

if __name__ == '__main__':
    p = mp.Pool(4)
    tasks = ['str{}'.format(i) for i in range(2000)]
    results = p.map(do_stuff, tasks)
    p.close()

    print(results)

yields

['str0', 'str1', 'str2', 'str3', ...]

One method of debugging scripts that use multiprocessing is to add logging statements. The multiprocessing module provides a helper function, mp.log_to_stderr, for this purpose. For example,

import multiprocessing as mp
import logging

logger = mp.log_to_stderr(logging.DEBUG)

def do_stuff(text):
    logger.info('Received {}'.format(text))
    return text

if __name__ == '__main__':
    p = mp.Pool(4)
    tasks = ['str{}'.format(i) for i in range(2000)]
    results = p.map(do_stuff, tasks)
    p.close()

    logger.info(results)

which yields logging output like:

[DEBUG/MainProcess] created semlock with handle 139824443588608
[DEBUG/MainProcess] created semlock with handle 139824443584512
[DEBUG/MainProcess] created semlock with handle 139824443580416
[DEBUG/MainProcess] created semlock with handle 139824443576320
[DEBUG/MainProcess] added worker
[INFO/PoolWorker-1] child process calling self.run()
[DEBUG/MainProcess] added worker
[INFO/PoolWorker-2] child process calling self.run()
[DEBUG/MainProcess] added worker
[INFO/PoolWorker-3] child process calling self.run()
[DEBUG/MainProcess] added worker
[INFO/PoolWorker-4] child process calling self.run()
[INFO/PoolWorker-1] Received str0
[INFO/PoolWorker-2] Received str125
[INFO/PoolWorker-3] Received str250
[INFO/PoolWorker-4] Received str375
[INFO/PoolWorker-3] Received str251
...
[INFO/PoolWorker-4] Received str1997
[INFO/PoolWorker-4] Received str1998
[INFO/PoolWorker-4] Received str1999
[DEBUG/MainProcess] closing pool
[INFO/MainProcess] ['str0', 'str1', 'str2', 'str3', ...]
[DEBUG/MainProcess] worker handler exiting
[DEBUG/MainProcess] task handler got sentinel
[INFO/MainProcess] process shutting down
[DEBUG/MainProcess] task handler sending sentinel to result handler
[DEBUG/MainProcess] running all "atexit" finalizers with priority >= 0
[DEBUG/MainProcess] finalizing pool
[DEBUG/MainProcess] task handler sending sentinel to workers
[DEBUG/MainProcess] helping task handler/workers to finish
[DEBUG/MainProcess] result handler got sentinel
[DEBUG/PoolWorker-3] worker got sentinel -- exiting
[DEBUG/MainProcess] removing tasks from inqueue until task handler finished
[DEBUG/MainProcess] ensuring that outqueue is not full
[DEBUG/MainProcess] task handler exiting
[DEBUG/PoolWorker-3] worker exiting after 2 tasks
[INFO/PoolWorker-3] process shutting down
[DEBUG/MainProcess] result handler exiting: len(cache)=0, thread._state=0
[DEBUG/PoolWorker-3] running all "atexit" finalizers with priority >= 0
[DEBUG/MainProcess] joining worker handler
[DEBUG/MainProcess] terminating workers
[DEBUG/PoolWorker-3] running the remaining "atexit" finalizers
[DEBUG/MainProcess] joining task handler
[DEBUG/MainProcess] joining result handler
[DEBUG/MainProcess] joining pool workers
[DEBUG/MainProcess] cleaning up worker 4811
[DEBUG/MainProcess] running the remaining "atexit" finalizers

Notice that each line indicates which process emitted the logging record. So the output to some extent serializes the order of events from amongst your concurrent processes.

By judicious placement of logging.info calls you should be able to narrow down where and maybe why your script is "dying silently" (or, at least it won't be quite so silent as it dies).

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Thanks that is a great insite and I have already changed my code to operate like this, unfortunately the map function is still dying silently and I don't know exactly why. – jeffery_the_wind Oct 23 '15 at 14:18
  • Can you post a runnable example which "dies silently"? – unutbu Oct 23 '15 at 16:11
  • Probably not, the function that is iterating is pretty long. I just wish I could get an error message out of it or something. From the surface it just looks like it stops the loop altogether. Like you have there, I print the results as right after pool.close(). The results get printed some times, and other times they don't. I will try to nail down an example. Thanks for the help. – jeffery_the_wind Oct 23 '15 at 16:39
  • I added a suggestion showing how to add logging statements to your program. – unutbu Oct 23 '15 at 16:49
  • Wow great thanks so much, you are good. I think the problem though is that the program never even makes it far enough to print the log, in that case. I think what is happening is that the map function thinks some process is still working and waiting for it. – jeffery_the_wind Oct 23 '15 at 17:19
  • I'm sorry yesterday I did not quite understand your answer. I have just implemented the logging like you have suggested, and that is great. Unfortunately I do not get any additional debugging info this way, and the program still hangs. The crazy thing is that it is not even consistent. I have found 1 place in the data where it hangs most of the time, but sometimes it gets past it, but ends up hanging later on. Please see my updated question. – jeffery_the_wind Oct 26 '15 at 09:38
  • Here is another way to debug problems like this: make a copy of your program. Then start simplifying it by paring away parts bit by bit. Remove a part of the calculation and then test if the problem still exists. If it still exists (great!) remove another part. If it no longer exists, then you know that the problem lies in the little bit you last removed. Keep going until you've identified the problem or can post a simple program which demonstrates the problem. – unutbu Oct 26 '15 at 10:35
  • Great, thank you I am going to try that. Another thing I just thought of is that I don't have the problem when I use `Pool(1)`. The problem only happens when the # of processes is greater than 1. The hanging up and stalling seems to happen earlier on if I use more processes. Like if I use `Pool(6)` the program will stall at a lot less iterations than if I use `Pool(5)` usually. – jeffery_the_wind Oct 26 '15 at 12:07
  • 1
    Note also that if the computation is CPU-bound there is no benefit to using more processes than the machine has processors. If you use `Pool()` then the multiprocessing module will determine the number of processors available and create a pool of that size. So usually you do not need to set a number when initializing a `Pool()`. – unutbu Oct 26 '15 at 12:39
  • Using the method you suggested, I narrowed down the problem to a mysql database transaction. Once I removed that, the pool did not hang anymore. I was using the `MySQLdb` package before and I just switched it to a `pandas.read_sql` and it seems to be going well so far. Thanks for the help! – jeffery_the_wind Oct 26 '15 at 17:15
  • what would be the equivalent of `mp.log_to_stderr` for concurrent.futures? – rudolfovic Mar 08 '22 at 19:01