21

Have a quick question about a shared variable between multiple processes using Multiprocessing.Pool().

Will I run in to any issues if I am updating a global list from within multiple processes? I.e. if two of the processes were to try to update the list at the same time.

I have seen documentation about using a Lock for similar things but I was wondering if it was necessary.

EDIT:

The way I am sharing this variable is by using a global variable in my callback function, 'successes' in which i append all of the successful actions to after the target function has completed:

TOTAL_SUCCESSES = []

def func(inputs):
    successes = []

    for input in inputs:
        result = #something with return code
        if result == 0:
            successes.append(input)
    return successes

def callback(successes):
    global TOTAL_SUCCESSES

    for entry in successes:
        TOTAL_SUCCESSES.append(entry)

def main():     
    pool = mp.Pool()
    for entry in myInputs:
         pool.apply_async(func, args=(entry,),callback=callback)         

Apologize for any syntax errors, wrote this up quickly but the program is working just wondering if I add the shared variable if I will have issues.

Thanks in advance!

DJMcCarthy12
  • 3,819
  • 8
  • 28
  • 34
  • How are you sharing the list? If you are passing a simple list in directly, each process will get a local copy of the list. Are you using a queue? A multiprocessing.Array? – Silas Ray Aug 20 '14 at 18:13
  • 1
    You'll glean a great deal by reading through the multiprocessing examples: https://docs.python.org/3/library/multiprocessing.html?highlight=multiprocessing#examples, but until you have some sample code for your implementation we can't tell you what pitfalls you might encounter. – g.d.d.c Aug 20 '14 at 18:14
  • Hey guys, thanks for the responses, edited with some code. – DJMcCarthy12 Aug 20 '14 at 18:21

1 Answers1

25

With your current code, you're not actually sharing CURRENT_SUCCESSES between processes. callback is executed in the main process, in a result handling thread. There is only one result handling thread, so each callback will be run one at a time, not concurrently. So your code as written is process/thread safe.

However, you are forgetting to return successes from func, which you'll want to fix.

Edit:

Also, this could be much more succinctly written using map:

def func(inputs):
    successes = []

    for input in inputs:
        result = #something with return code
        if result == 0:
            successes.append(input)
    return successes

def main():     
    pool = mp.Pool()
    total_successes = pool.map(func, myInputs) # Returns a list of lists
    # Flatten the list of lists
    total_successes = [ent for sublist in total_successes for ent in sublist]
dano
  • 91,354
  • 19
  • 222
  • 219