fork()/execv() hangs on MPI nodes (C++)

Question

I am writing a C++ program with MPI that will launch external programs on MPI nodes. For this I use fork()/execv().

The problem is that the process starts normally but than freezes at some point if large number of CPUs is used (nCPU > 48). I have reasons to believe that the problem is caused by the method which uses fork()/execv().

The code:

int execute (char *task) {            
    int world_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank); 

    pid_t child_pid;
    int status;

    if ((child_pid = fork()) < 0) {
        cout << "Warning: fork failure on Node#" << world_rank << " with task " << task << endl;
        perror("Warning (fork failure)");
        exit(1);
    }   

    if (child_pid == 0) {        
        //Execute on child thread

        //Prepare command line arguments as a null-terminated array
        std::vector<char*> args;
        char* tasks = strtok(task, " ");
        while(tasks != NULL) {       
            args.push_back(tasks);
            tasks = strtok(NULL, " ");
        }
        args.push_back(NULL);

        //Execute program args[0] with arguments args[0], args[1], args[2], etc.
        execv(args[0], &args.front());           

        //Print on failure
        cout << "Warning: execl failure on Node#" << world_rank << " with task " << task << endl;
        perror("Warning (execl failure)");
        _exit(1);
    }    
    else {
        //Parent process - wait for status message
        wait(&status);        
    }        

    //Return status message
    return status;
}

RESOLVED (at least it looks like it ... )

I`ve modified my code to implement vfork() instead of fork() and now everything works. Note, that vfork() must be immediately followed by execve or _exit when it returns to the child process.

The code:

int execute (char *task) {            
    int world_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank); 

    pid_t child_pid;
    int status;

    //Prepare command line arguments as a null-terminated array
    std::vector<char*> args;
    char* tasks = strtok(task, " ");
    while(tasks != NULL) {       
        args.push_back(tasks);
        tasks = strtok(NULL, " ");
    }
    args.push_back(NULL);

    if ((child_pid = vfork()) < 0) {        
        exit(1);
    }   

    //Child process
    if (child_pid == 0) {        

        // !!! Since we are using vfork the child process must immediately call _exit or execve !!! 
        // !!!    _Nothing_ else should be done in this part of the code to avoid corruption    !!!

        //Execute program args[0] with arguments args[0], args[1], args[2], etc.
        execve(args[0], &args.front(), NULL);
        _exit(1);
    }   
    //Parent process
    else {
        //Wait for child process
        wait(&status);        
    }        

    //Return status message
    return status;
}

Do you get anything from attaching a debugger to the running process? Or strace? `gdb -p ` — Peter Cordes, Jul 21 '15 at 15:43
I am running the program on a supercomputer. So I don't have a pid - only the task number. If I run the program on a local 8 core CPU I never get any errors or problems. — sda, Jul 21 '15 at 15:58
There are ways how to do that on a supercomputer too http://www.archer.ac.uk/documentation/best-practice-guide/debug.php — Vladimir F Героям слава, Jul 21 '15 at 23:04
Maybe ask your cluster admin if there's a baby-cluster where you can test? One that lets you get an interactive session on a compute node, so you can find your process and debug it? All I can think of is MPI sockets being set `FD_CLOEXEC` or not, and things like that. — Peter Cordes, Jul 22 '15 at 04:54
Another idea: have a job fork/exec `strace -f -otrace.mypid -p mypid` to attach an strace to itself before continuing. Make sure to use a small job, in case you end up with a larger flood of trace output than you expected! — Peter Cordes, Jul 22 '15 at 04:56

fork()/execv() hangs on MPI nodes (C++)

0 Answers0