parallel python, or MPI?

Question

I have a code with heavy symbolic calculations (many multiple symbolic integrals). Also I have access to both an 8-core cpu computer (with 18 GB RAM) and a small 32 cpu cluster. I prefer to remain on my professor's 8-core pc rather than to go to another professor's lab using his cluster in a more limited time, however, I'm not sure it will work on the SMP system, so I am looking for a parallel tool in Python that can be used on both SMP and Clusters and of course prefer the codes on one system to be easily and with least effort modifiable for use on the other system.

So far, I have found Parallel Python (PP) promising for my need, but I have recently told that MPI also does the same (pyMPI or MPI4py). I couldn't approve this as seemingly very little is discussed about this on the web, only here it is stated that MPI (both pyMPI or MPI4py) is usable for clusters only, if I am right about that "only"!

Is "Parallel Python" my only choice, or I can also happily use MPI based solutions? Which one is more promising for my needs?

PS. It seems none of them have very comprehensive documentations so if you know some links to other than their official websites that can help a newbie in parallel computation I will be so grateful if you would also mention them in your answer :)

Edit.

My code has two loops one inside the other, the outer loop cannot be parallelized as it is an iteration method (a recursive solution) each step depending on the values calculated within its previous step. The outer loop contains the inner loop alongside 3 extra equations whose calculations depend on the whole results of the inner loop. However, the inner loop (which contains 9 out of 12 equations computable at each step) can be safely parallelized, all 3*3 equations are independent w.r.t each other, only depending on the previous step. All my equations are so computationally heavy as each contains many multiple symbolic integrals. Seemingly I can parallelize both the inner loop's 9 equations and the integration calculations in each of these 9 equation separately, and also parallelize all the integrations in other 3 equations alongside the inner loop. You can find my code here if it can help you better understand my need, it is written inside SageMath.

I don't know if this is relevant or not because it is only a companion option to your total solution, but I thought I would add a comment... [ZeroMQ](http://www.zeromq.org/) is a way to develop whatever your "workers" are in a way that is transparent in scalability. You can start with local processes, and simply expand it to workers that are connected via sockets. The code logic remains the same. — jdi, Nov 29 '12 at 00:54
The best part to read right now is the "guide". Its this amazing explanation of the concept of threading and scalability: http://zguide.zeromq.org/page:all . The multiprocessing part comes from your own implementation, connected via ZeroMQ for communication — jdi, Nov 29 '12 at 01:12
MPI can be used on a single computer, very easily. However, it probably isn't the solution you want unless you're doing multi-language programs or have some independent reason to learn MPI, because there are more concepts to learn than with, say, multiprocessing. — abarnert, Nov 29 '12 at 01:25
Do you make heavy use of `numpy`/`scipy`? If so, I'd look at the solutions that are tied into that ecosystem to see if they help with, e.g., partitioning and distributing your data. — abarnert, Nov 29 '12 at 01:34
@jdi, zqm seems interesting, mostly because it tries to be minimalistic and simple, but as a newbie how can I chose between it, multiprocessing and PP? A short hint would be highly appreciated :) — owari, Nov 29 '12 at 05:42
@abarnert, Thanks very much, now I know MPI is not for me ;) , I am using Python in Sage Math and mainly use its libraries, and if I have any success I prefer not to use any numerical algorithm in my code (specifically for multiple integrations) and hence I have no use of scipy right now! — owari, Nov 29 '12 at 05:49
zmq would be the form of coordination and communication between the components of your system. Lets say you decide to use `multiprocessing` module. You can have 8 worker processes go into a work loop accepting jobs, communicating down a zmq push/pull. They just consume work and communicate the results back out on another zmq socket. If you wanted to expand this to 100 network machines, they simply connect on that same push/pull to accept work. The system doesn't see the difference. The python multiprocess module is really all you need. You can even have workers in other languages connect. — jdi, Nov 29 '12 at 06:29
Is your parallelism naturally based around short independent jobs, long-running jobs with explicit messaging, or a single job with a huge but easily-partitionable data set? It sounds like it's probably the first or the last, but I'm not sure which. — abarnert, Nov 29 '12 at 11:46
@owari: why not give a try to `ipython's parallel computing` support; it is very easy to set-up and its up-to your requirement. also as already discussed in above comments it also includes `zeromq`; http://ipython.org/ipython-doc/rel-0.13.1/parallel/index.html — namit, Nov 29 '12 at 11:50
@NamitKewat, I was thinking the interactive thing would be problematic as now I have my code written in SageMath. I'm getting confused then if IPython is another good choice beside Multiprocessing. Can you please take a look at my code [here](http://ask.sagemath.org/question/1661/how-to-speed-up-a-code-containing-several-symbolic) if IPython can do the job. Specifically is it easier and more direct to use it compare to Multiprocessing? — owari, Nov 29 '12 at 17:10
There's just too much information here to reply in comments on two threads, so I added them as an answer, even though it's not really an answer. I should mention that I've done very little playing with IPython's parallel features, so I can't compare it to `multiprocessing`. — abarnert, Nov 29 '12 at 21:43
@Owari: running your code in ipython is challenging because of sage! but your code can easily be translated to scipy/numpy/matplotlib equivalent; if you can do so; then rest part is very easy.. . — namit, Nov 30 '12 at 03:02
@NamitKewat, thank you very much, actually I am not very good at programming so prefer not to start a new challenge, also for lack of time. Thanks for your helpful comment! — owari, Nov 30 '12 at 07:55

tacaswell · Answer 1 · 2012-11-29T19:41:28.273

3

I would look in to multiprocessing (doc) which provides a bunch of nice tools for spawning and working with sub-processes.

To quote the documentation:

multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads.

From the comments I think the Pool and it's map would serve your purposes (doc).

def work_done_in_inner_loop(arg):
    # put your work code here
    pass

p = Pool(9)
for o in outer_loop:
     # what ever else you do
     list_of_args = [...] # what your inner loop currently loops over
     res = p.map(work_done_in_inner_loop,list_of_args])
     # rest of code

edited Nov 29 '12 at 19:41

answered Nov 29 '12 at 00:52

tacaswell

84,579
22
210
199

As much as I have understood, multiprocessing is usable on SMP and clouds, so I was thinking if it can be used on clouds it should also be usable on local clusters as well, but no where found anything about it. You mean it can be used on clusters as well? And is it easier to handle than PP? Thanks for your answer! – owari Nov 29 '12 at 01:06
2

`multiprocessing` can run on a collection of independent computers, each running a "remote manager". It doesn't matter whether they're in a cloud or in a local cluster, as long as you can connect sockets between them. This makes things very simple—but it does mean that if you've got any fancy features in your cluster, `multiprocessing` won't take any advantage of them. – abarnert Nov 29 '12 at 01:30
@abarnert, so that multiprocessing is not a general good choice for cluster computations, right? Is it any better than PP? Simpler or more functional I mean. Also what do you mean by remote manager? Is it an independent application like filezilla for example? sorry for the noob question. – owari Nov 29 '12 at 05:53
A remote manager is just a Python instance, nothing complicated. And `multiprocessing` is a good choice for cluster computations if your cluster is basically a grid with fast local networking; not so good if you want to take advantage of master/slave load-balancing, hardware PVM acceleration, special features of cluster filesystems, etc. – abarnert Nov 29 '12 at 11:40
1

@owari you should move the last two comments into your question. – tacaswell Nov 29 '12 at 19:45
@abarnert, I added a paragraph "Edit" to my original question that describes my problem better. That I can parallelize my equations with respect to each other and then parallelize the calculations of integrations inside each equation, it might be the master/slave load-balancing that you have mentioned, I don't know, maybe there is a need for definition of both children and sub-children processes, and maybe not only with children processes this can be handled. Thanks for your time :) – owari Nov 29 '12 at 20:16
@tcaswell, thank you very much, I appreciate your kind help. I'm now trying to use your code, but maybe first I should read some more about the module itself and especially the Pool and its map. – owari Nov 29 '12 at 20:22

score 1 · Answer 2 · answered Nov 29 '12 at 21:41

It seems like there are a few reasonable ways to design this.

Let me refer to your jobs as the main job, the 9 intermediate jobs, and the many inner jobs the intermediate jobs can spin off. I'm assuming the intermediate jobs have a "merge" step after the inner jobs all finish, and the same for the outer job.

The simplest design is that the main job fires off the intermediate jobs and then waits for them all to finish before doings its merge step. Then intermediate jobs then fire off the inner jobs and wait for them all to finish before doing their merge steps.

This can work with a single shared queue, but you need a queue that doesn't block the worker pool while waiting, and I don't think multiprocessing's Pool and Queue can do that out of the box. As soon as you've got all of your processes waiting to join their children, nothing gets done.

One way around that is to change to a continuation-passing style. If you know which one of the intermediate jobs will finish last, you can pass it the handles to the other intermediate jobs and have it join on them and do the merge, instead of the outer job. And the intermediate similarly pass off the merge to their last inner job.

The problem is that you usually have no way of knowing what's going to finish last, even without scheduling issues. So that means you need some form of either sharing (e.g., a semaphore) or message passing between the jobs to negotiate that among themselves. You can do that on top of multiprocessing. The only problem is that it destroys the independence of your jobs, and you're suddenly dealing with all the annoying problems of shared concurrency.

A different alternative is to have separate pools and queues for each intermediate job, and some kind of load balancing between the pools that can ensure that each core is running one active process.

Or, of course, a single pool with a more complicated implementation than multiprocessing's, which does either load balancing or cooperative scheduling, so a joiner doesn't block a core.

Or a super-simple solution: Overschedule, and pay a little cost in context switching for simplicity. For example, you can run 32 workers even though you've only got 8 cores, so you've got 22 active workers and 10 waiting. Each core has 2 or 3 active workers, which will slow things down a bit, but maybe not too badly—and at least nobody's idle, and you didn't have to write any code beyond passing a different parameter to the multiprocessing.Pool constructor.

At any rate, multiprocessing is very simple, and it has almost no extra concepts that won't apply to other solutions. So it may take less time to play with it until you run into a brick wall or don't, than to try to figure out in advance whether it'll work for you.

thanks, your answer is so helpful and enlightening for me :) your two last recommendation seems more intuitive and thus more understandable to me. — owari, Nov 30 '12 at 01:32

score -1 · Answer 3 · answered Nov 29 '12 at 01:23

-1

I recently ran into a similar problem. However, the following solution is only valid if (1) you wish to run the python script individually on a group of files, AND (2) each invocation of the script is independent of the others.

If the above applies to you, the simplest solution is to write a wrapper in bash along the lines of:

for a_file in $list_of_files
do
    python python_script.py a_file &
done

The '&' will run the preceding command as a sub-process. The advantage is that bash will not wait for the python script to finish before continuing with the for loop.

You may want to place a cap on the number of processes running simultaneously, since this code will use all available resources.

answered Nov 29 '12 at 01:23

mbattifarano

1,189
7
7

This doesn't seem really well suited for what the OP is asking. It is just a bunch of completely independent processes running from external input args. The idea for multiprocessing is to be able to take a problem and split it up, and then join the results back. This is just a primitive bash wrapper to process a "void" return operation on a list of files. – jdi Nov 29 '12 at 02:11
@immattbatt, thanks for your solution, I may be able to put each integration that I need to be computed separately in one file but that would be quite difficult and time consuming. Also I'm not sure how would this answer parallelize my code as different integrations sometimes use similar data, for example different functionals of a same function found in a previous step. However, thanks for your time and effort :) – owari Nov 29 '12 at 05:59

parallel python, or MPI?

3 Answers3