0

Introduction

For my library, I need to split up a cuboid point lattice for assignment to multiple processing elements (PEs).

Example decomposition

As you can see, no points are shared between the PEs, and certain PEs (here 4-7) get only 2D 'slices'.

The interface accepts n1, n2, n3, P which are all integers. nx indicates the number of lattice points along direction x i.e. n1, n2 or n3. P indicates the number of PEs available.

Current problem formulation

Since the domain is a cuboid, I'm looking for a 3-part-factorization of P such that P = p1 * p2 * p3 and px is proportional to nx.

Limitations of problem formulation

  1. 3-part-factorization algorithm needs to be developed.
  2. Some sort of metric must be used to choose any one of the many 3-part-factorizations generated. In the example in the image, n1 = 3, n2 = 5, n3 = 4 and say, P = 54, we would have the 3-part-factorizations (27, 2, 1), (9, 3, 2), (6, 9, 1) and (6, 3, 3). Which one is here applicable? A simple magnitude-of-difference metric would choose (6, 3, 3). But even this has multiple permutations (3, 6, 3) and (3, 3, 6).
  3. If the chosen 3-part-factorization has a factor which is greater than the corresponding lattice size (nx) then we would have to truncate it, which means some PEs will do no work.
  4. If the chosen 3-part-factorization has a factor (perhaps after the truncation mentioned in the above point) which does not divide its corresponding lattice size, we have unequal work-sharing (although some PEs might be sleeping as seen in the above point).

How reader can help

  • How can we improve this inelegant algorithm? Could you suggest any improvement to any of the points in 'Limitations of problem formulation'?
  • Is there a better and more formalized approach (i.e. problem formulation)?
  • Links to papers/websites/articles/books would be greatly appreciated.
Nitin Malapally
  • 534
  • 2
  • 10
  • I suggest pointing your favourite search engine at *efficient mpi domain decomposition* and variations on that theme. There is a lot of prior work on this. I'll just comment on your current strategy - don't ignore the possibility you might spread a problem of size `54` over a `(4,4,4)` set of PEs and leave some of them idle - that would reduce the time to solution compared with a `(6,3,3)` decomposition. – High Performance Mark Jan 18 '22 at 11:09
  • Oh, and to add to my previous comment - don't get too stuck on decomposing a global lattice of size `N` into equal pieces all of size `n`. You could decompose a problem of size `54` into `64` pieces and spread them over `(4,4,4)` processing elements. There are a lot of ways of skinning this cat. – High Performance Mark Jan 18 '22 at 11:23
  • Yes, I'm also doing a look-up while this question is (hopefully) being answered by others. The problem is that most searches yield very application-specific ideas e.g. FEM-grids, grids for numerical PDE-solvers etc. Regarding your first tip: yes, that's true and such considerations further increase the complexity of the algorithm. But its an important point nonetheless. Regarding padding: the presence of zeroes in my domain would actually distort the results (end goal is a 3D-FFT). – Nitin Malapally Jan 18 '22 at 11:32
  • I didn't suggest padding, just underusing the resources on some of the PEs. – High Performance Mark Jan 18 '22 at 12:03
  • Duly noted. Thanks for the tips sofar. – Nitin Malapally Jan 18 '22 at 12:14

0 Answers0