Can't get multiprocessing to run processes concurrently

Question

The code below doesn't seem to run concurrently, and I'm not sure exactly why:

def run_normalizers(config, debug, num_threads, name=None):

    def _run():
        print('Started process for normalizer')
        sqla_engine = init_sqla_from_config(config)
        image_vfs = create_s3vfs_from_config(config, config.AWS_S3_IMAGE_BUCKET)
        storage_vfs = create_s3vfs_from_config(config, config.AWS_S3_STORAGE_BUCKET)

        pp = PipedPiper(config, image_vfs, storage_vfs, debug=debug)

        if name:
            pp.run_pipeline_normalizers(name)
        else:
            pp.run_all_normalizers()
        print('Normalizer process complete')

    threads = []
    for i in range(num_threads):
        threads.append(multiprocessing.Process(target=_run))
    [t.start() for t in threads]
    [t.join() for t in threads]


run_normalizers(...)

The config variable is just a dictionary defined outside of the _run() function. All of the processes seem to be created - but it isn't any faster than if I do it with a single process. Basically what's happening in the run_**_normalizers() functions is reading from a queue table in a database (SQLAlchemy), then making a few HTTP requests, and then runing a 'pipeline' of normalizers to modify data and then save it back into the database. I'm coming from the JVM land where threads are 'heavy' and often used for parallelism - i'm a bit confused by this as i thought the multiprocess module was supposed to get around the limitations of Python's GIL.

The multiprocessing module uses processes, not threads. It is therefore not affected by the GIL. — Lennart Regebro, Jul 19 '13 at 04:56
I've tested your code and the essential technique is OK. I'm not sure about the shared `config`, if the `config` dictionary is use a lot, that could in theory slow things down. It's possible that the processor isn't your bottleneck here. — Lennart Regebro, Jul 19 '13 at 05:23
I've only run it on my workstation, 8 cores 16GB RAM Linux. With 1 or 1, 8 or 16 processes nothing changes - and system resources are fine. — Brian Dilley, Jul 19 '13 at 15:49
load average: 0.42, 0.31, 0.24 (this is when not running the app). this is while running the app: load average: 0.59, 0.53, 0.34. I don't think it's CPU bound. — Brian Dilley, Jul 19 '13 at 15:53
And the load avg eventually dropped down to the pre-running the app rate and a bit below at some points (I've also got X11 and firefox and what not running on my workstation so i don't think this is even registering). — Brian Dilley, Jul 19 '13 at 15:54
OK, so I guessed correctly that the bottleneck is somewhere else than CPU and parallellizing it is not helpful at this point. The question then is where the bottleneck *is*, which I have no idea. — Lennart Regebro, Jul 20 '13 at 03:39
kcachegrind is telling me that most of the time is spent in socket code (urllib2 HTTP client socket reading) - still investigating this. — Brian Dilley, Jul 20 '13 at 08:25
That seems reasonable. Normally then threading should be enough. If it does help the bottleneck might be the servers you talk to, or bandwidth. — Lennart Regebro, Jul 21 '13 at 18:22
so i've stuck with multiprocess (and tried threading as well) and i had something interesting happen. A server took a while to respond (as in < 45 seconds) once and i watched as every single process stalled while it waited for that request to return data... maybe it's related to urllib2? — Brian Dilley, Jul 26 '13 at 07:49
fixed my multiprocessing problem - and actually switched the threads. Not sure what actually fixed it thought - i just re-architected everything and made workers and tasks and what not and things are flying now.... so, I'm going to close this question... thanks for the help guys. — Brian Dilley, Jul 26 '13 at 19:59
The problem is most probably that your code is touching some shared filehandles, or sockets to sqla, or something from before the fork — Antti Haapala -- Слава Україні, Jul 30 '13 at 15:31

score 3 · Accepted Answer · answered Jul 26 '13 at 20:01

fixed my multiprocessing problem - and actually switched the threads. Not sure what actually fixed it thought - i just re-architected everything and made workers and tasks and what not and things are flying now. Here's the basics of what i did:

import abc
from Queue import Empty, Queue
from threading import Thread

class AbstractTask(object):
    """
        The base task
    """
    __metaclass__ = abc.ABCMeta

    @abc.abstractmethod
    def run_task(self):
        pass

class TaskRunner(object):

    def __init__(self, queue_size, num_threads=1, stop_on_exception=False):
        super(TaskRunner, self).__init__()
        self.queue              = Queue(queue_size)
        self.execute_tasks      = True
        self.stop_on_exception  = stop_on_exception

        # create a worker
        def _worker():
            while self.execute_tasks:

                # get a task
                task = None
                try:
                    task = self.queue.get(False, 1)
                except Empty:
                    continue

                # execute the task
                failed = True
                try:
                    task.run_task()
                    failed = False
                finally:
                    if failed and self.stop_on_exception:
                        print('Stopping due to exception')
                        self.execute_tasks = False
                    self.queue.task_done()

        # start threads
        for i in range(0, int(num_threads)):
            t = Thread(target=_worker)
            t.daemon = True
            t.start()


    def add_task(self, task, block=True, timeout=None):
        """
            Adds a task
        """
        if not self.execute_tasks:
            raise Exception('TaskRunner is not accepting tasks')
        self.queue.put(task, block, timeout)


    def wait_for_tasks(self):
        """
            Waits for tasks to complete
        """
        if not self.execute_tasks:
            raise Exception('TaskRunner is not accepting tasks')
        self.queue.join()

all i do is create a TaskRunner and add tasks to it (thousands of them) and then call wait_for_tasks(). so, obviously in the re-architecture that I did I 'fixed' some other problem that i had. Odd though.

score 1 · Answer 2 · answered Jul 29 '13 at 20:40

If you are still looking for a multiprocessing solution, you first might want to check out how to use a pool of workers, then you wouldn't have to manage the num_threads processes on your own: http://docs.python.org/2/library/multiprocessing.html#using-a-pool-of-workers

And for the slowdown problem, have you tried passing the config object as an argument to the _run function? I don't know whether/how this would make a change internally, but it's a guess that it might change something.

Can't get multiprocessing to run processes concurrently

2 Answers2