From multiprocessing to distributed processing in python standard library

Question

I am studying this code from gitHub about distributed processing. I would like to thank eliben for this nice post. I have read his explanations but there are some dark spots. As far as I understand, the code is for distributing tasks in multiple machines/clients. My questions are:

The most basic of my questions is where the distribution of the work to different machines is happening?
Why there is an if else statement in the main function?
Let me start this question in a more general way. I thought that we usually start a Process in a specific chunk (independent memory part) and not pass all the chunks at once like this:
```
chunksize = int(math.ceil(len(HugeList) / float(nprocs)))
for i in range(nprocs):
p = Process(
            target = myWorker, # This is my worker
            args=(HugeList[chunksize * i:chunksize * (i + 1)],
            HUGEQ)      
processes.append(p)
p.start() 
```
In this simple case where we have nprocs processes. Each process initiate an instance of the function myWorker that work on the specified chunk.

My question here is:
- How many threads do we have for each process that work in each chunk?
Looking now into the gitHub code I am trying to understand the mp_factorizer? More specifically, in this function we do not have chunks but a huge queue (shared_job_q). This huge queue is consisted of sub-lists of size 43 maximum. This queue is passed into the factorizer_worker. There via get we obtain those sub-lists and pass them into the serial worker. I understand that we need this queue to share data between clients.

My questions here is:
- Do we call an instance of the factorizer_worker function for each of the nprocs(=8) processes?
- Which part of the data each process work? (Generally, we have 8 processes and 43 chunks.)
- How many threads exist for each process?
- Does get function called from each process thread?

Thanks for your time.

score 2 · Accepted Answer · answered Aug 28 '14 at 22:22

The distribution to multiple machines only happens if you actually run the script on multiple machines. The first time you run the script (without the --client option), it starts the Manager server on a specific IP/port, which hosts the shared job/result queues. In addition to starting the Manager server, runserver will also act as a worker, by calling mp_factorizer. It is additionally responsible for collecting the results from the result queue and processing them. You could run this script by itself and get a complete result.

However, you can also distribute the factorization work to other machines, by running the script on other machines using the --client option. That will call runclient, which will connect to the existing Manager server you started with the initial run of the script. That means that the clients are accessing the same shared queues runserver is using, so they can all pull work from and put results to the same queues.

The above should covers questions 1 and 2.

I'm not exactly sure what you're asking in question 3. I think you're wondering why we don't pass a chunk of the list to each worker explicitly (like in the example you included), rather than putting all the chunks into a queue. The answer there is because the runserver method doesn't know how many workers there are actually going to be. It knows that it's going to start 8 workers. However, it doesn't want to split the HugeList into eight chunks and send them to the 8 processes it's creating, because it wants to support remote clients connection to the Manager and doing work, too. So instead, it picks an arbitrary size for each chunk (43) and divides the list into as many chunks of that size as it takes to consume the entire HugeList, and sticks it in a Queue. Here's the code in runserver that does that:

chunksize = 43
for i in range(0, len(nums), chunksize):
    #print 'putting chunk %s:%s in job Q' % (i, i + chunksize)
    shared_job_q.put(nums[i:i + chunksize]) # Adds a 43-item chunk to the shared queue.

That way, as many workers as you want can connect to the Manager server, grab a chunk from shared_job_q, process it, and return a result.

Do we call an instance of the factorizer_worker function for each of the nprocs(=8) processes?

Yes

Which part of the data each process work? (Generally, we have 8 processes and 43 chunks.)

We don't have 43 chunks. We have X number of chunks, each of size 43. Each worker process just grabs chunks off the queue and processes them. Which part it gets is arbitrary and depends on how many workers there are and how fast each is going.

How many threads exist for each process?

One. If you mean now many worker processes exist for each instance of the script, there are 8 in the server process, and 4 in each client process.

Does get function called from each process thread?

Not sure what you mean by this.

Thanks for the answer I am working on it. Give me some time to process your logic. :) — Thoth, Aug 29 '14 at 09:25
So in the `mp_factorizer` we start `nprocs` processes. Each process grasps the `shared_job_q` queue and stars working on it inside the `factorizer_worker`. Since now I have understand that a process correspond to a thread that works on separate memory parts (chunks). So each thread gets (`get_nowait`) a sub-list and work the line `outdict = {n: factorize_naive(n) for n in job}`. This is happening for 8 threads (processes), almost simultaneously, on the server. All threads-processes are synchronized by `get_nowait()`. Could you please make any corrections on my logic? Thanks! — Thoth, Aug 29 '14 at 09:49
@Thoth yes, that's all correct, assuming you swap out the word "thread" for "process". There isn't any multi-threading going on here, it's all multi-processing. — dano, Aug 29 '14 at 14:17
Nice! So the last question `Does get function called from each process thread?` is now more clear either to you or to me now. :). The get function (`get_nowait`) is called by a single process/thread each time. I think the execution is more clear to me now, the basics were clarified thanks to you, a little more study is left for assimilation :)!!! — Thoth, Aug 29 '14 at 14:54

From multiprocessing to distributed processing in python standard library

1 Answers1