1

When using MPI_Reduce, I need to deal with some complex data structure. So I need to define my own reduce function by using MPI_Op_create. But the thing is, one or more processes are keep crashing, even on the very simple user defined function, which is all the same to MPI_SUM.

The code is attached below. Note that if I change the parameter in MPI_Reduce from "myOp" to "MPI_SUM", the code works perfectly. So I'm really sure the problem is on the user defined function "myOp". But further debugging in the function "MAX_DataSet" showed the function worked properly with correct result inside. But the program simply cannot correctly execute MPI_Reduce. What is the reason then... Any help is greatly appreciated!

#include <iostream>
#include "mpi.h"

using std::cout;
using std::endl;

void MAX_DataSet(int *in, int *inout, int *len, MPI_Datatype *datatype);

int main(int argc,char **argv)
{
    int procSize, procID, x, sum;
    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &procSize);
    MPI_Comm_rank(MPI_COMM_WORLD, &procID);
    MPI_Op myOp;
    MPI_Op_create((MPI_User_function*)MAX_DataSet, true, &myOp);

    if (procID == 0)
        x = 15;
    else
        x = procID;
    MPI_Reduce(&x, &sum, 1, MPI_INT, myOp, 0, MPI_COMM_WORLD);
    if (procID == 0)    // process 0 cannot go here
        cout << sum << endl;

    MPI_Finalize();
    return 0;
}

void MAX_DataSet(int *in, int *inout, int *len, MPI_Datatype *datatype)
{
    for (int i = 0; i < *len; i++)
    {
        *inout = *in + *inout;
        in++;
        inout++;
    }
}
OwenShi
  • 11
  • 3
  • When you say "crashing", what do you mean. Can you provide the error output? – Wesley Bland May 05 '15 at 19:54
  • The code runs just fine for me. Which MPI implementation / version are you using? – Wesley Bland May 05 '15 at 20:02
  • Thanks for your prompt reply Velimir. I've added the missing part of MPI_Op. The typical error message in mpiexec is "job aborted: [ranks] message [0] process exited without calling finalize. mpi_max.exe ended prematurely and may have crashed. exit code 0xc0000005". – OwenShi May 05 '15 at 20:03
  • And by "crashing", I mean the above error message. – OwenShi May 05 '15 at 20:06
  • This is probably an problem with your library. The code is fine. – Wesley Bland May 05 '15 at 20:08
  • I was debugging in both windows and supercomputer Stampede. The error from Stampeded is [c557-301.stampede.tacc.utexas.edu:mpispawn_0][child_handler] MPI process (rank: 0, pid: 8853) exited with status 1 [c557-301.stampede.tacc.utexas.edu:mpispawn_0][child_handler] MPI process (rank: 1, pid: 8854) exited with status 1. If it is an issue of library, I don't think it cannot work on Stampede... Actually I also think my code is correct. – OwenShi May 05 '15 at 20:10
  • Those are just generic error messages though. Which MPI library are you using? MPICH? Open MPI? Which version? – Wesley Bland May 05 '15 at 20:11
  • I'm using MS-MPI, Microsoft HPC Pack 2012. – OwenShi May 05 '15 at 20:12
  • As far as I know, MS-MPI supports user-defined reductions (https://msdn.microsoft.com/en-us/library/dn473455(v=vs.85).aspx). You might need to ask the TACC user support folk directly if they know differently. They also might have another MPI library installed that you can use. – Wesley Bland May 05 '15 at 20:17
  • 2
    Also, since when is Stampede a Windows machine? Are you compiling on your laptop and running on Stampede? If so, that's your problem. They're not binary compatible. I'd be shocked if anything ran. – Wesley Bland May 05 '15 at 20:22
  • Thanks a lot Wesley. I will try to find out how to solve the library issue. Before coming here, I also tried several existing code with user defined function, but none of them were successful. So I'm pretty sure it's a library/environment issue now. – OwenShi May 05 '15 at 20:23
  • If you're compiling your code on Stampede, you should be using either MVAPICH or Intel MPI, both of which definitely support `MPI_Op_create`. Make sure you're using the right versions before you go too far down the user support path. – Wesley Bland May 05 '15 at 20:26
  • I know I have to compile on stampede using script file since it's using unix. I was able to run several other much more complex code on it. If I have to avoid MPI_Op_create, I will do far more programming like MPI_Send/Recv to do the same thing... Anyway, I will try to figure it out by investigating the libraries. Thanks again! – OwenShi May 05 '15 at 20:36
  • Your op doesn't match the type signature of a user-defined reduction, although this is not likely to be the source of the error. – Jeff Hammond May 06 '15 at 19:45

0 Answers0