Improving a clumsy automatic cuboid domain decomposition algorithm

Question

Introduction

For my library, I need to split up a cuboid point lattice for assignment to multiple processing elements (PEs).

As you can see, no points are shared between the PEs, and certain PEs (here 4-7) get only 2D 'slices'.

The interface accepts n1, n2, n3, P which are all integers. nx indicates the number of lattice points along direction x i.e. n1, n2 or n3. P indicates the number of PEs available.

Current problem formulation

Since the domain is a cuboid, I'm looking for a 3-part-factorization of P such that P = p1 * p2 * p3 and px is proportional to nx.

Limitations of problem formulation

3-part-factorization algorithm needs to be developed.
Some sort of metric must be used to choose any one of the many 3-part-factorizations generated. In the example in the image, n1 = 3, n2 = 5, n3 = 4 and say, P = 54, we would have the 3-part-factorizations (27, 2, 1), (9, 3, 2), (6, 9, 1) and (6, 3, 3). Which one is here applicable? A simple magnitude-of-difference metric would choose (6, 3, 3). But even this has multiple permutations (3, 6, 3) and (3, 3, 6).
If the chosen 3-part-factorization has a factor which is greater than the corresponding lattice size (nx) then we would have to truncate it, which means some PEs will do no work.
If the chosen 3-part-factorization has a factor (perhaps after the truncation mentioned in the above point) which does not divide its corresponding lattice size, we have unequal work-sharing (although some PEs might be sleeping as seen in the above point).

How reader can help

How can we improve this inelegant algorithm? Could you suggest any improvement to any of the points in 'Limitations of problem formulation'?
Is there a better and more formalized approach (i.e. problem formulation)?
Links to papers/websites/articles/books would be greatly appreciated.

I suggest pointing your favourite search engine at *efficient mpi domain decomposition* and variations on that theme. There is a lot of prior work on this. I'll just comment on your current strategy - don't ignore the possibility you might spread a problem of size `54` over a `(4,4,4)` set of PEs and leave some of them idle - that would reduce the time to solution compared with a `(6,3,3)` decomposition. — High Performance Mark, Jan 18 '22 at 11:09
Oh, and to add to my previous comment - don't get too stuck on decomposing a global lattice of size `N` into equal pieces all of size `n`. You could decompose a problem of size `54` into `64` pieces and spread them over `(4,4,4)` processing elements. There are a lot of ways of skinning this cat. — High Performance Mark, Jan 18 '22 at 11:23
Yes, I'm also doing a look-up while this question is (hopefully) being answered by others. The problem is that most searches yield very application-specific ideas e.g. FEM-grids, grids for numerical PDE-solvers etc. Regarding your first tip: yes, that's true and such considerations further increase the complexity of the algorithm. But its an important point nonetheless. Regarding padding: the presence of zeroes in my domain would actually distort the results (end goal is a 3D-FFT). — Nitin Malapally, Jan 18 '22 at 11:32
I didn't suggest padding, just underusing the resources on some of the PEs. — High Performance Mark, Jan 18 '22 at 12:03

Improving a clumsy automatic cuboid domain decomposition algorithm

Introduction

Current problem formulation

Limitations of problem formulation

How reader can help

0 Answers0