2

I need to synchronize intermediate solutions of an optimization problem solved distributively over a number of worker processors. The solution vector is known to be sparse.

I have noticed that if I use MPI_AllReduce, the performance is good compared to my own AllReduce implementation.

However, I believe, the performance can be further improved if AllReduce could communicate only the nonzero entries in the solution vector. I could not find any such implementation of AllReduce.

Any ideas?

It seems that MPI_type_indexed can not be used as the indices of the nonzero entries are not known in advance.

Soumitra
  • 189
  • 1
  • 10
  • No such function in the MPI standard. You should implement it by yourself. – Hristo Iliev Dec 16 '15 at 00:23
  • 1
    I agree with @HristoIliev but if you want to improve the performance of MPI_Allreduce() - you can possibly go for intra-node reduction, communicate 1 inter-node message per node, followed by intra-node broadcast. All this will need making intra-node and inter-node communicators. I noticed the word performance and hence this suggestion. Cheers. – Gaurav Saxena Dec 16 '15 at 05:16
  • Thanks @HristoIliev. – Soumitra Dec 16 '15 at 17:38
  • Thanks @GauravSaxena. Could you kindly point me to some base code from which I could start my own implementation? – Soumitra Dec 16 '15 at 17:40
  • @Soumitra : base code difficult to obtain, here are steps (1) Obtain node-name using `MPI_Get_processor_name()` (2) `Hash` this name to a unique integer (3) Use the integer in step 2 as `colour` in `MPI_Comm_split()` - creates intra-node communicators. The `key` can be equal to the original `rank` to start new ranking from zero (4) Say each node has `x` processes, then form a communicator of all 0 ranks, 1 rank, 2 rank,...,(x-1) th ranks of each node i.e. inter-node communicators. (5) Intra-node `MPI_Reduce()`. (6) Use _any_ inter-node comm for `MPI_Allreduce()` (7) Intra-node `MPI_Bcast()`. – Gaurav Saxena Dec 18 '15 at 14:05

2 Answers2

0

There aren't sparse collectives in MPI. It's something that the MPI Forum has discussed in the past (to what end, I don't know), but there has also been research in the area. Usually though, when discussing these sorts of things in the forum, I believe they relate more to collectives that don't involve all processes rather than all of the data.

As Hristo said in the comments, the goal of MPI (according to some) has always been to enable more optimized tricks on top of MPI and to just use it as a low level library to abstract the communication calls. Obviously, this hasn't been how MPI has actually be used most of the time, but you can still write your own sparse collectives. Sounds like a good paper to me.

Wesley Bland
  • 8,816
  • 3
  • 44
  • 59
  • Thanks @WesleyBland. More so for the paper. To be more optimistic, do you know any implementation of the ideas in paper? – Soumitra Dec 16 '15 at 17:42
  • I don't know of any, but in order to publish the paper, I'm sure there was a research quality implementation. – Wesley Bland Dec 16 '15 at 17:43
0

Similar problem here. Most likely you will need to implement your custom MPI_Allreduce().

There is an optimized implementation here. Very possibly you have already found this link: https://fs.hlrs.de/projects/par/mpi//myreduce.html

If you want ideas for a better performance implementation, some here:

https://dl.acm.org/citation.cfm?id=2642791
https://dl.acm.org/citation.cfm?id=2642773

Note that they don't provide an implementation and you may need to pay an small fee.

Good luck

user9869932
  • 6,571
  • 3
  • 55
  • 49