3

I'm working with IPython and Spyder IDE on a Windows machine. When the IDE is starting, a set of py-files is loaded to define some functions that make my work a bit easier. Everything works as expected.

Now I would like to upgrade one of these function to use multiprocessing, but on Windows this requires the if __name__ == "__main__": statement. So it seems that I cannot call the function directly and pass the arguments from the IPython console.

For example one of the py-files (let's call it test.py) could look like the following code.

import multiprocessing as mp
import random
import string

# define a example function
def rand_string(length, output):
    """ Generates a random string of numbers, lower- and uppercase chars. """
    rand_str = ''.join(random.choice(
                string.ascii_lowercase
                + string.ascii_uppercase
                + string.digits)
           for i in range(length))
    output.put(rand_str)


def myFunction():
    # Define an output queue
    output = mp.Queue()        

    # Setup a list of processes that we want to run
    processes = [mp.Process(target=rand_string, args=(5, output)) for x in range(4)]

    # Run processes
    for p in processes:
        p.start()

    # Exit the completed processes
    for p in processes:
        p.join()

    # Get process results from the output queue
    results = [output.get() for p in processes]

    print(results)

In my IPython console I would like to use the line

myFunction()

to trigger all the calculations. But on Windows a end up getting a BrokenPipe error.

When I put

if __name__ == "__main__":
     myFunction()

at the end of the py-file and run the complete file by

runfile(test.py)

it works. Of course. But that makes it very hard to pass arguments to the function as I always have to edit the test.py-file itself.

My question is: How do I get the multiprocessing function running without putting it in this if __name__ == "__main__": statement??

boardrider
  • 5,882
  • 7
  • 49
  • 86
RaJa
  • 1,471
  • 13
  • 17

2 Answers2

4

So, I solved that specific problem.

  1. Put the defintion of rand_string in a separate file, called test2.

  2. Import test2 as module into my test.py script

    import test2 as test2

  3. modify the following line to access the test2 module

    processes = [mp.Process(target=test2.rand_string, args=(5, output)) for x in range(4)]
    
  4. Run test.py

  5. Call myFunction()

  6. Be Happy :)

The solution is based on this multiprocessing tutorial that suggests to import the target function from another script. This solution bypasses the safe self import by the if __name__ -wrapper to get access to the target function.

Community
  • 1
  • 1
RaJa
  • 1,471
  • 13
  • 17
3

multiprocessing doesn't work without running with if __name__ == '__main__'.

You could however, use a fork of multiprocessing that essentially leverages dill to treat the interpreter session as a file… (in short, it works).

Python 2.7.9 (default, Dec 11 2014, 01:21:43) 
Type "copyright", "credits" or "license" for more information.

IPython 3.0.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: from pathos.multiprocessing import ProcessingPool as Pool

In [2]: def squared(x):
   ...:     return x**2
   ...: 

In [3]: x = range(10)

In [4]: p = Pool()

In [5]: p.map(squared, x)
Out[5]: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [6]: res = p.imap(squared, x)

In [7]: list(res)
Out[7]: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [8]: 

You can use the built-in multiprocessing that's been augmented by the dill serializer too, or you can build a Queue with Pool().apipe, either of which are more like what you seem to be interested in doing with Queue.

Get pathos here: https://github.com/uqfoundation

Mike McKerns
  • 33,715
  • 8
  • 119
  • 139
  • It seems worth a try, but I am using Python 3.4 which is currently not supported by pathos. Maybe in two month, when the next version will be released that also supports Windows. – RaJa Apr 20 '15 at 10:50
  • The `pathos` master/trunk now supports windows, but is still `2.x`-only. It should work on `3.x` shortly, however. The core of `pathos.multiprocessing` is now in a standalone package called `multiprocess` that works on `2.x` and `3.x`, and also supports Windows. – Mike McKerns Jun 29 '15 at 18:02