0

I'm currently working with a program that is written in C with MPI parallelization. The computation grid is divided up using common domain decomposition techniques. The processes layout is as follows with respect to the 2D decomposition (simplified):

/////////////////////////////
/       /         /         /
/   1   /    2    /    3    /
/       /         /         /
/////////////////////////////
/       /         /         /
/   4   /    5    /    6    /
/       /         /         /
/////////////////////////////
/       /         /         /
/   7   /    8    /    9    /
/       /         /         /
/////////////////////////////

At one point in the code, I have to solve an series of equations which have X-dependencies only. With the topology in it's current form, it can only be parallelized with 3 processes at a time, due to the x-dependency, which leads to my question...is there a convenient/efficient way to map the current topology to another one within the code, which favors full parallelization i.e. using all 9 processes? For example, something like this:

/////////////////////////////
/            1              /
/////////////////////////////
/            2              /
/////////////////////////////
/            3              /
/////////////////////////////
/            4              /
/////////////////////////////
/            5              /
/////////////////////////////
/            6              /
/////////////////////////////
/            7              /
/////////////////////////////
/            8              /
/////////////////////////////
/            9              /
/////////////////////////////

One might ask, why not start with this...well the 2D domain decomposition is much more efficient for the overall problem plus I also have y-dependencies later where I need to do something similar with the topology, thus the image above would be transposed.

So, I need to map the 2D topology to the 1D topology within the code (on the fly) using some communication routines to enable full parallelization with 9 processes, but I'm not sure if there is an efficient and effective way of doing this VS running the original problem with 3 processes in parallel. Any suggestions would be helpful. Thanks!!

ThatsRightJack
  • 721
  • 6
  • 29
  • I presume NO. Are you administrator or user of the grid? As what I can tell is that user might want to just sit well with the topology and go straight to the result. However, AFAIK this problem is about middelware. – Yang Apr 11 '13 at 02:28
  • What do you mean by "map the topology" - are you talking about simply reordering the processes, or transferring data, or...? Code would help. You can create as many [MPI Topologies](https://computing.llnl.gov/tutorials/mpi/#Virtual_Topologies) as you like over the set of processes, and both of these are simple cases of the built-in Cartesian topology, but I'm not sure what you're going to do with them. – Jonathan Dursi Apr 11 '13 at 02:28
  • @JonathanDursi that's why I ask him for his role. In principle topological shift is feasible but if he concerns more about the result I wouldn't suggest. – Yang Apr 11 '13 at 02:37
  • ...and it also depends on intensity of computation in the pieces of your code that you are going to run in either 1D or 2D layout. In case you know which leads to performance bottleneck, then you could determine. – Yang Apr 11 '13 at 02:43
  • First off, thanks for getting back to me. I'll try to elaborate more on the problem. The main computation grid is 2D. Using MPI, I decompose the computation grid into a 2D subdomains as shown above. Each MPI process reads in their respective sections of the computation grid and performs calculations. At one point in the computation, there is a series of x-dependent calculations that need to be performed (something like x[i] = x[i-1]). With the 2D decomposition shown above, I can only run 3 processes in parallel (columns), then I need to transfer halo cells etc. – ThatsRightJack Apr 12 '13 at 06:16
  • In reference to the first image above...this means processes 1,4,7 would calculate x[i]=x[i-1]. Then I would have to exchange halo cells. Then processes 2,5,8 would calculate x etc. – ThatsRightJack Apr 12 '13 at 06:31
  • Is there was a way to transform the original 2D decomposition in the first image to the 1D decomposition in the second image? If so, I could run processes 1-9 to solve for x[i]=x[i-1]. When I said "map the topology" I meant something like a 1-to-1 transformation to send the data from the 2D decomposition to the 1D decomposition easily. – ThatsRightJack Apr 12 '13 at 06:39
  • The images above show the MPI process layout where the data is local. That being said, process 1 in the first image above contains data that needs to be transferred to processes 1 and 2 in the second image. Just think of it as two different stencils which map to the same computation grid where each sub domain is a MPI process. I would like to go from one stencil to the other efficiently. – ThatsRightJack Apr 12 '13 at 06:50
  • I know there will be communication routines involved and data movement but I'm searching for an efficient way to do so. I just thought this might be a problem someone has faced in the past. It may even turn out that doing nothing and computing in serial is better – ThatsRightJack Apr 12 '13 at 06:51
  • In this case, you need to repartition your data so that each processor has a piece to compute. It's not really an MPI problem but an algorithmic one. – ipapadop Jun 08 '13 at 01:10

0 Answers0