0

I got multiple parallel processes writing into one list in python. My code is:

global_list = []
class MyThread(threading.Thread):
    ...
    def run(self):
    results = self.calculate_results()

    global_list.extend(results)


def total_results():
    for param in params:
         t = MyThread(param)
         t.start()
    while threading.active_count() > 1:
        pass
    return total_results

I don't like this aproach as it has:

  1. An overall global variable -> What would be the way to have a local variable for the `total_results function?
  2. The way I check when the list is returned seems somewhat clumsy, what would be the standard way?
ProfHase85
  • 11,763
  • 7
  • 48
  • 66
  • Please note that in your current code you're modifying shared memory (the global list) from more than one thread and you need a mutex/lock around the modification operations. – davecom Sep 09 '14 at 12:33
  • @davecom: that is intuitive but perhaps not really true, given the Python Global Interpreter Lock which already makes it impossible to modify the list from multiple threads even without explicit locking. – John Zwinck Sep 09 '14 at 12:34
  • According to http://effbot.org/pyfaq/what-kinds-of-global-value-mutation-are-thread-safe.htm there is no need for locks as 'extend' is atomic operation (I am using cpython, so locking is not the problem here) – ProfHase85 Sep 09 '14 at 12:36
  • I would use `RLock()` with `acquire()` and `release()` – Vor Sep 09 '14 at 12:36
  • @JohnZwinck Good point, but not all Python interpreters have the GIL – davecom Sep 09 '14 at 12:36
  • @davecom: sure, only the ones in production. ;-) I kid, sort of. – John Zwinck Sep 09 '14 at 12:37

2 Answers2

2

Is your computation CPU-intensive? If so you should look at the multiprocessing module which is included with Python and offers a fairly easy to use Pool class into which you can feed compute tasks and later get all the results. If you need a lot of CPU time this will be faster anyway, because Python doesn't do threading all that well: only a single interpreter thread can run at a time in one process. Multiprocessing sidesteps that (and offers the Pool abstraction which makes your job easier). Oh, and if you really want to stick with threads, multiprocessing has a ThreadPool too.

John Zwinck
  • 239,568
  • 38
  • 324
  • 436
1

1 - Use a class variable shared between all Worker's instances to append your results

from threading import Thread

class Worker(Thread):
    results = []
    ...

    def run(self):
        results = self.calculate_results()
        Worker.results.extend(results) # extending a list is thread safe

2 - Use join() to wait untill all the threads are done and let them have some computational time

def total_results(params):
    # create all workers
    workers = [Worker(p) for p in params]

    # start all workers
    [w.start() for w in workers]

    # wait for all of them to finish
    [w.join() for w in workers]

    #get the result
    return Worker.results
David Guyon
  • 2,759
  • 1
  • 28
  • 40
sebdelsol
  • 1,045
  • 8
  • 15