MPI_Bcast
sends the message from one process (the 'root') to all others, by definition. It probably will be a little faster than just looping over all processes, too. The mpich2
implementation, for instance, uses a binomial tree to distribute the message.
In case you don't want to broadcast to MPI_COMM_WORLD, but you want to define subgroups, you can go about it like this:
#include <stdio.h>
#include "mpi.h"
#define NPROCS 8
int main(int argc, char **argv)
{
int rank, new_rank, sendbuf, recvbuf,
ranks1[4]={0,1,2,3}, ranks2[4]={4,5,6,7};
MPI_Group orig_group, new_group;
MPI_Comm new_comm;
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
sendbuf = rank;
/* Extract the original group handle */
MPI_Comm_group(MPI_COMM_WORLD, &orig_group);
/* Divide tasks into two groups based on rank */
if (rank < NPROCS/2) {
MPI_Group_incl(orig_group, NPROCS/2, ranks1, &new_group);
} else {
MPI_Group_incl(orig_group, NPROCS/2, ranks2, &new_group);
}
/* Create new communicator and then perform some comm
* Here, MPI_Allreduce, but you can MPI_Bcast at will
*/
MPI_Comm_create(MPI_COMM_WORLD, new_group, &new_comm);
MPI_Allreduce(&sendbuf, &recvbuf, 1, MPI_INT, MPI_SUM, new_comm);
MPI_Group_rank (new_group, &new_rank);
printf("rank= %d newrank= %d recvbuf= %d\n", rank, new_rank, recvbuf);
MPI_Finalize();
}
Which might produce something like the following output:
rank= 7 newrank= 3 recvbuf= 22
rank= 0 newrank= 0 recvbuf= 6
rank= 1 newrank= 1 recvbuf= 6
rank= 2 newrank= 2 recvbuf= 6
rank= 6 newrank= 2 recvbuf= 22
rank= 3 newrank= 3 recvbuf= 6
rank= 4 newrank= 0 recvbuf= 22
rank= 5 newrank= 1 recvbuf= 22