2

I'm breaking my teeth on multiprocessing within Python but I'm not having any luck wrapping my head around the subject. Basically I have a procedure that is time consuming to run. I need to run it for a range of 1 to 100 but I'd like to abort all processes once the condition I'm looking for has been met. The condition being the return value == 90.

Here is a non multiprocess chunk of code. Can anyone give me an example of how they would convert it to a multiprocess function where the the code will exit all process once the condition of "90" has been met?

def Addsomething(i):
    SumOfSomething = i + 1    
    return SumOfSomething

def RunMyProcess():
    for i in range(100):
        Something = Addsomething(i)
        print Something
    return

if __name__ == "__main__":
    RunMyProcess()

Edit:

I got this error while testing the 3rd version. Any idea what is causing this?

Exception in thread Thread-3:
Traceback (most recent call last):
  File "C:\Python27\lib\threading.py", line 554, in __bootstrap_inner
    self.run()
  File "C:\Python27\lib\threading.py", line 507, in run
    self.__target(*self.__args, **self.__kwargs)
  File "C:\Python27\lib\multiprocessing\pool.py", line 379, in _handle_results
    cache[job]._set(i, obj)
  File "C:\Python27\lib\multiprocessing\pool.py", line 527, in _set
    self._callback(self._value)
  File "N:\PV\_Proposals\2013\ESS - Clear Sky\01-CODE\MultiTest3.py", line 20, in check_result
    pool.terminate()
  File "C:\Python27\lib\multiprocessing\pool.py", line 423, in terminate
    self._terminate()
  File "C:\Python27\lib\multiprocessing\util.py", line 200, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "C:\Python27\lib\multiprocessing\pool.py", line 476, in _terminate_pool
    result_handler.join(1e100)
  File "C:\Python27\lib\threading.py", line 657, in join
    raise RuntimeError("cannot join current thread")
RuntimeError: cannot join current thread
EnergyGeek
  • 271
  • 1
  • 4
  • 14
  • Why don't you just put an empty return after the print Something line? – ZekeDroid Jan 31 '14 at 21:22
  • Do you want to run separate processes, or separate threads? It is easy to send kill signals to Popen processes, but for threads you need to manually create and check something like an Event object. Also, can you clarify whether you just want RunMyProcess to execute in the background, or whether you want Addsomething to execute for each value of i in parallel? – Dane White Jan 31 '14 at 21:27
  • Additionally, keep in mind that threads don't really have a return value in the same way that a function does. Return values need to be accessed by either a polling or callback function. – Dane White Jan 31 '14 at 21:43
  • I'm looking to run separate processes. My actual code takes about 30 sec to run for each iteration. For this example "Addsomething" represents my code that takes so long and needs to run for each iteration. I want to run each i in parallel but kill all process once I hit a value of 90. – EnergyGeek Jan 31 '14 at 21:48

1 Answers1

6

Maybe something like this is what you're looking for? Keep in mind I'm writing for Python 3. Your print statement above is Python 2, in which case a side note would be to use xrange instead of range.

from argparse import ArgumentParser
from random import random
from subprocess import Popen
from sys import exit
from time import sleep

def add_something(i):

    # Sleep to simulate the long calculation
    sleep(random() * 30)
    return i + 1

def run_my_process():

    # Start up all of the processes, pass i as command line argument
    # since you have your function in the same file, we'll have to handle that
    # inside 'main' below
    processes = []
    for i in range(100):
        processes.append(Popen(['python', 'thisfile.py', str(i)]))

    # Wait for your desired process result
    # Might want to add a short sleep to the loop
    done = False
    while not done:
       for proc in processes:
            returncode = proc.poll()
            if returncode == 90:
                done = True
                break

    # Kill any process that are still running
    for proc in processes:

        if proc.returncode is None:

            # Might run into a race condition here,
            # so might want to wrap with try block
            proc.kill()

if __name__ == '__main__':

    # Look for optional i argument here
    parser = ArgumentParser()
    parser.add_argument('i', type=int, nargs='?')
    i = parser.parse_args().i

    # If there isn't an i, then run the whole thing
    if i is None:
        run_my_process()

    else:
        # Otherwise, run your expensive calculation and return the result
        returncode = add_something(i)
        print(returncode)
        exit(returncode)

EDIT:

Here's a somewhat cleaner version that uses the multiprocessing module instead of subprocess:

from random import random
from multiprocessing import Process
from sys import exit
from time import sleep

def add_something(i):

    # Sleep to simulate the long calculation
    sleep(random() * 30)

    exitcode = i + 1
    print(exitcode)
    exit(exitcode)

def run_my_process():

    # Start up all of the processes
    processes = []
    for i in range(100):
        proc = Process(target=add_something, args=[i])
        processes.append(proc)
        proc.start()

    # Wait for the desired process result
    done = False
    while not done:
        for proc in processes:
            if proc.exitcode == 90:
                done = True
                break

    # Kill any processes that are still running
    for proc in processes:
        if proc.is_alive():
            proc.terminate()

if __name__ == '__main__':
    run_my_process()

EDIT 2:

Here's one last version, which I think is much better than the other two:

from random import random
from multiprocessing import Pool
from time import sleep

def add_something(i):

    # Sleep to simulate the long calculation
    sleep(random() * 30)
    return i + 1

def run_my_process():

    # Create a process pool
    pool = Pool(100)

    # Callback function that checks results and kills the pool
    def check_result(result):
        print(result)
        if result == 90:
            pool.terminate()

    # Start up all of the processes
    for i in range(100):
        pool.apply_async(add_something, args=[i], callback=check_result)

    pool.close()
    pool.join()

if __name__ == '__main__':
    run_my_process()
Dane White
  • 3,443
  • 18
  • 16
  • I have 2.7 installed but this ran without any errors. It also stops at 90 everytime. I think I have a grasp as to how this works. I'll need to spend sometime manipulating it to get it to work with my code. I'll let you know if I get stuck. – EnergyGeek Jan 31 '14 at 23:32
  • Is it essential to call using: if __name__ == '__main__': run_my_process() – EnergyGeek Feb 01 '14 at 00:40
  • 2
    If the error you're getting is 'RuntimeEror: cannot join current thread', then this is a Python bug that was fixed in 2.7 and 3.2. See http://bugs.python.org/issue15101 for details. When I run the built-in 2.7 on my mac, then I get this error. But it runs fine with the 3.3 version I have installed. – Dane White Feb 04 '14 at 17:52
  • It looks like I'm using 2.7.2. The latest version is 2.7.6. I try to updat and see if that works. – EnergyGeek Feb 04 '14 at 18:59
  • Yes, Updating to 2.7.6 fixed this issue. Thanks! – EnergyGeek Feb 04 '14 at 20:12
  • If I need to return mutiple arguments how would you sugest I go about doing it with the call back function? My script needs to pass a few lists and a couple other arguments. – EnergyGeek Feb 05 '14 at 00:27
  • Ah, I was able to solve this using the manager and creating a list. I'm somewhat worried about a possible race condition but everything seems to work. Although my cpu's are pegged at 100%.:) – EnergyGeek Feb 05 '14 at 17:30
  • Glad everything worked out for you. Would you consider this the accepted answer, then? – Dane White Feb 06 '14 at 18:47