parallel writing to list in python

Question

I got multiple parallel processes writing into one list in python. My code is:

global_list = []
class MyThread(threading.Thread):
    ...
    def run(self):
    results = self.calculate_results()

    global_list.extend(results)


def total_results():
    for param in params:
         t = MyThread(param)
         t.start()
    while threading.active_count() > 1:
        pass
    return total_results

I don't like this aproach as it has:

An overall global variable -> What would be the way to have a local variable for the `total_results function?
The way I check when the list is returned seems somewhat clumsy, what would be the standard way?

Please note that in your current code you're modifying shared memory (the global list) from more than one thread and you need a mutex/lock around the modification operations. — davecom, Sep 09 '14 at 12:33
@davecom: that is intuitive but perhaps not really true, given the Python Global Interpreter Lock which already makes it impossible to modify the list from multiple threads even without explicit locking. — John Zwinck, Sep 09 '14 at 12:34
According to http://effbot.org/pyfaq/what-kinds-of-global-value-mutation-are-thread-safe.htm there is no need for locks as 'extend' is atomic operation (I am using cpython, so locking is not the problem here) — ProfHase85, Sep 09 '14 at 12:36
@JohnZwinck Good point, but not all Python interpreters have the GIL — davecom, Sep 09 '14 at 12:36
@davecom: sure, only the ones in production. ;-) I kid, sort of. — John Zwinck, Sep 09 '14 at 12:37

score 2 · Answer 1 · answered Sep 09 '14 at 12:33

Is your computation CPU-intensive? If so you should look at the multiprocessing module which is included with Python and offers a fairly easy to use Pool class into which you can feed compute tasks and later get all the results. If you need a lot of CPU time this will be faster anyway, because Python doesn't do threading all that well: only a single interpreter thread can run at a time in one process. Multiprocessing sidesteps that (and offers the Pool abstraction which makes your job easier). Oh, and if you really want to stick with threads, multiprocessing has a ThreadPool too.

My computation ist network-intensive. In this case, performance is not my concern but the poor code design — ProfHase85, Sep 09 '14 at 12:40

score 1 · Accepted Answer · edited May 14 '20 at 20:37

1 - Use a class variable shared between all Worker's instances to append your results

from threading import Thread

class Worker(Thread):
    results = []
    ...

    def run(self):
        results = self.calculate_results()
        Worker.results.extend(results) # extending a list is thread safe

2 - Use join() to wait untill all the threads are done and let them have some computational time

def total_results(params):
    # create all workers
    workers = [Worker(p) for p in params]

    # start all workers
    [w.start() for w in workers]

    # wait for all of them to finish
    [w.join() for w in workers]

    #get the result
    return Worker.results

parallel writing to list in python

2 Answers2