4

I am learning up on methods of multiprocessing in Python and have found myself with a question. Consider the following example:

import multiprocessing as mp

def worker(n):
    print('worker %d' % n)
    return

if __name__ == '__main__':
    jobs = []
    for i in range(5):
        p = mp.Process(target = worker, args = (i,))
        jobs.append(p)
        p.start()

This is how the documentation that I am following uses Process.

Is it necessary to user args = (i,)? I have never seen this syntax in Python before and it seems strange. I tested, and this works exactly the same:

p = mp.Process(target = worker(i))

Is there any reason that I should avoid that? Thanks for any help.

Blair
  • 6,623
  • 1
  • 36
  • 42
pretzlstyle
  • 2,774
  • 5
  • 23
  • 40
  • You can grab the sys.argv[n] at a specific location – FirebladeDan Jul 27 '15 at 21:50
  • You could also use args = [i], the main requirement being that it must be unpackable with * to an argument list, see https://docs.python.org/2/tutorial/controlflow.html#unpacking-argument-lists and https://docs.python.org/3.1/tutorial/controlflow.html#unpacking-argument-lists. –  Jul 27 '15 at 23:16

2 Answers2

4

Here's a quick way for you to prove that it isn't the same thing. Change your worker(i) function to this:

import time

def worker(n):
    print('worker %d' % n)
    time.sleep(1)
    return

When you call this the first way, you'll notice that you still get all 5 prints at the same time right at the beginning. When you do it your second way, you'll see that you get all 5 prints staggered, with a second between each one.

Think about what you're trying to set up: 5 independent processes are each being started up at about the same time, each print at about the same time, and then each wait about a second, but the total time elapsed is only a little more than a second. This is what you want to have happen.

Here's the key point: target = worker sets target to be the worker function, and args = (i,) sets args to be a single element tuple containing i. On the other hand, target = worker(i) calls worker(i) and sets target to be the value that the function returns, in this case None. You're not really using multiprocessing at all when you do it the second way. If you have a really time consuming task that you want to split across multiple processes, then you'll see no improvement when it's done the second way.

Basically, anytime you have func(args), you're going to be calling the function and getting its return value, whereas when you pass func and args separately, you allow the multiprocessing package to work its magic and make those function calls on independent processes. Setting the target to be func(args) will just call the function in the main process, losing you any benefits from multiprocessing in the first place.

Blair
  • 6,623
  • 1
  • 36
  • 42
1

This is for declaring tuple vars (without, it's an int, a string, a float…) but not an immutable:

>>> i = 4
>>> k = (i,)
>>> type(k)
<type 'tuple'>
>>> k=(i)
>>> type(k)
<type 'int'>
Clodion
  • 1,017
  • 6
  • 12
  • 1
    You haven't (and I did not vote you down), but your answer didn't answer my question fully. I was asking the purpose in separating the arguments from the function name, I didn't realize what it meant to pass a function itself rather than the result of a function. But you addressed differences in data types. And people on StackOverflow are always ready to downvote for no good reason. – pretzlstyle Jul 28 '15 at 21:58
  • @jphollowed: you're right, I have responded only to a part of the question. And yes, down-vote seems easier than up-vote. :-) – Clodion Jul 28 '15 at 22:04