2

We are writing a code to solve non linear problem using an iterative method (Newton). Anyway, the problem is that we don't know a priori how many MPI processes will be needed from one iteration to another, due to e.g. remeshing, adaptivity, etc. And there is quite a lot of iterations...

We hence would like to use MPI_Comm_Spawn at each iteration to create as much MPI process as we need, gather the results and "destroy" the subprocesses. We know this limits the scalability of the code due to the gathering of information, however, we have been asked to do it :)

I did a couple of tests of MPI_Comm_Spawn on my laptop (on windows 7/64bit) using intel MPI and Visual Studio express 2013. I tried these simple codes

//StackMain
#include <iostream>
#include <mpi.h>
#include<vector>
int main(int argc, char *argv[])
{
    int ierr = MPI_Init(&argc,& argv);
    for (int i = 0; i < 10000; i++)
    {
        std::cout << "Loop number "<< i << std::endl;
        MPI_Comm children;
        std::vector<int> err(4);
        ierr = MPI_Comm_spawn("StackWorkers.exe", NULL, 4, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &children, &err[0]);
        MPI_Barrier(children);
        MPI_Comm_disconnect(&children);
    }
    ierr = MPI_Finalize(); 
    return 0;
}

And the program launched by the spawned processes:

//StackWorkers
#include <mpi.h>
int main(int argc, char *argv[])
{
    int ierr = MPI_Init(&argc,& argv);
    MPI_Comm parent;
    ierr = MPI_Comm_get_parent(&parent);
    MPI_Barrier(parent);
    ierr = MPI_Finalize();
    return 0;
}

The program is launched using one MPI process:

mpiexec -np 1 StackMain.exe

It seems to work, I do have however some questions...

1- The program freezes during iteration 4096, this number do not change if I relaunch the program. If during each iteration I launch 2 times 4 process, then it will stop at iteration 2048th... Is it a limitation from the operating system ?

2- When I look at the memory occupied by "mpiexec" during the program, it grows continuously (never going down). Do you know why ? I though that, when subprocess finnished their job, they would release the memory they used...

3- Should I disconnect/free the children communicator or not ? If yes, MPI_Disconnect(...) must be called on both spawned and spawnee processes ? Or only spawnee ?

Thanks a lot!

bertbk
  • 105
  • 7
  • Unfortunately, I do not know the answer having never used `MPI_Comm_spawn` before. However, I believe that question 2 and 1 are linked. If you are constantly growing memory, it's very possible that you reach the limit of your laptop (at which point in your application, the program cannot allocate anymore and freezes). This limit should be reached at the same iteration in your for loop every time. – NoseKnowsAll Sep 10 '15 at 15:02
  • Thanks for your answer. Sorry I was not precise enough: the memory reaches about 170Mo at the time of "freezing" (only the program is freezing, not the computer). I hence don't think I reach any limits concerning the memory of my laptop (16Go) – bertbk Sep 10 '15 at 16:01
  • 3
    Great to see a question on `MPI_SPAWN`. I've not used this much either but got through all 99999 with your code on linux (Ubuntu 12.04, gcc 5.2.0, MPICH version 3.1, `MPI_UNIVERSE_SIZE=1681915913`), no apparent memory increase and `valgrind` reports no leaks. Your example looks similar to minimal example (http://www.mpi-forum.org/docs/mpi-2.0/mpi-20-html/node98.htm). Maybe upgrade mpi version? – Ed Smith Sep 10 '15 at 18:31
  • That's a great news for us, thank you! I'm going to check/upgrate my MPI version... – bertbk Sep 11 '15 at 08:27

0 Answers0