I am studying this code from gitHub
about distributed processing. I would like to thank eliben
for this nice post. I have read his explanations but there are some dark spots. As far as I understand, the code is for distributing tasks in multiple machines/clients. My questions are:
The most basic of my questions is where the distribution of the work to different machines is happening?
Why there is an if else statement in the main function?
Let me start this question in a more general way. I thought that we usually start a
Process
in a specific chunk (independent memory part) and not pass all the chunks at once like this:chunksize = int(math.ceil(len(HugeList) / float(nprocs))) for i in range(nprocs): p = Process( target = myWorker, # This is my worker args=(HugeList[chunksize * i:chunksize * (i + 1)], HUGEQ) processes.append(p) p.start()
In this simple case where we have
nprocs
processes. Each process initiate an instance of the functionmyWorker
that work on the specified chunk.My question here is:
- How many threads do we have for each process that work in each chunk?
Looking now into the
gitHub
code I am trying to understand themp_factorizer
? More specifically, in this function we do not have chunks but a huge queue (shared_job_q
). This huge queue is consisted of sub-lists of size 43 maximum. This queue is passed into thefactorizer_worker
. There viaget
we obtain those sub-lists and pass them into the serial worker. I understand that we need this queue to share data between clients.My questions here is:
- Do we call an instance of the
factorizer_worker
function for each of thenprocs
(=8) processes? - Which part of the data each process work? (Generally, we have 8 processes and 43 chunks.)
- How many threads exist for each process?
- Does
get
function called from each process thread?
- Do we call an instance of the
Thanks for your time.