0

My basic question is about how the suppression files work in valgrind. I have looked at a lot of the documentation that points to using the following on mpi versions > 1.5 (mine is 1.6):

    mpirun -np 2 valgrind --suppressions=/usr/share/openmpi/openmpi-valgrind.supp --track-origins=yes ./myprog

However, when I run it like this the file has over 600 errors! The errors I am getting are these two over and over. I don't know how to interpret either one of these with my current understanding of valgrind and mpi.

==8821==  Address 0xad5e4d7 is 87 bytes inside a block of size 128 alloc'd
==8821==    at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8821==    by 0x6348C52: ??? (in /usr/lib/openmpi/lib/libopen-pal.so.0.0.0)
==8821==    by 0x6349AF1: ??? (in /usr/lib/openmpi/lib/libopen-pal.so.0.0.0)
==8821==    by 0x6349B81: ??? (in /usr/lib/openmpi/lib/libopen-pal.so.0.0.0)
==8821==    by 0x7DA5B9C: ??? (in /usr/lib/openmpi/lib/openmpi/mca_grpcomm_bad.so)
==8821==    by 0x7DA52F4: ??? (in /usr/lib/openmpi/lib/openmpi/mca_grpcomm_bad.so)
==8821==    by 0x5082AF2: ??? (in /usr/lib/openmpi/lib/libmpi.so.0.0.2)
==8821==    by 0x50A33FA: PMPI_Init (in /usr/lib/openmpi/lib/libmpi.so.0.0.2)
==8821==    by 0x408AB5: main (test_send-receive.cpp:8)
==8821==  Uninitialised value was created by a heap allocation
==8821==    at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8821==    by 0x635FE2B: ??? (in /usr/lib/openmpi/lib/libopen-pal.so.0.0.0)
==8821==    by 0x6360634: opal_ifcount (in /usr/lib/openmpi/lib/libopen-pal.so.0.0.0)
==8821==    by 0x81B36AA: ??? (in /usr/lib/openmpi/lib/openmpi/mca_oob_tcp.so)
==8821==    by 0x5C01EE2: mca_oob_base_init (in /usr/lib/openmpi/lib/libopen-rte.so.0.0.0)
==8821==    by 0x7FA97FB: ??? (in /usr/lib/openmpi/lib/openmpi/mca_rml_oob.so)
==8821==    by 0x5C083E4: orte_rml_base_select (in /usr/lib/openmpi/lib/libopen-rte.so.0.0.0)
==8821==    by 0x5BF5EC4: orte_ess_base_app_setup (in /usr/lib/openmpi/lib/libopen-rte.so.0.0.0)
==8821==    by 0x7BA1EAE: ??? (in /usr/lib/openmpi/lib/openmpi/mca_ess_env.so)
==8821==    by 0x5BDDB72: orte_init (in /usr/lib/openmpi/lib/libopen-rte.so.0.0.0)
==8821==    by 0x50822E0: ??? (in /usr/lib/openmpi/lib/libmpi.so.0.0.2)
==8821==    by 0x50A33FA: PMPI_Init (in /usr/lib/openmpi/lib/libmpi.so.0.0.2)

The code that produces these errors is:

int main(int argc, char *argv[]) {

  /* init MPI */
  MPI_Init(&argc, &argv);

  int myid;
  MPI_Comm_rank(MPI_COMM_WORLD, &myid);
  int i;
  if(myid == 0){
    double * d = new double [10];
    for(i = 0; i<10; i++){
      d[i] = i + 1.0;
    }
    MPI_Send(d,
             10,
         MPI_DOUBLE,
         1,
         1,
         MPI_COMM_WORLD);
    delete[] d;
  } else {
    MPI_Status status;
    double * c = new double [10];
    MPI_Recv(c,
         10,
         MPI_DOUBLE,
             0,
         MPI_ANY_TAG,
         MPI_COMM_WORLD,
         &status);

    for(i = 0; i<10; i++){
      printf("%f\n", c[i]);
    }
    delete[] c;
  }
  MPI_Finalize();
  return 0;
    }

Also, this code runs just fine and outputs the expected results. Am I misunderstanding how the data is sent over the network or is there something else going on here that I don't understand?

Sorry about the length of the post, you guys rock for even reading this far.

Muttonchop
  • 353
  • 4
  • 22
  • Do the valgrind errors occur when the code take the myid==0 path, or the "else" path, or do they occur in both cases? – Jeremy Friesner Jun 27 '12 at 04:19
  • Also, you aren't checking the return value of the MPI_Recv() call (or the 'status' variable) to see if MPI_Recv() succeeded or not. Therefore it might be that MPI_Recv() is failing for some reason and therefore not writing any data to the (c) array, which would cause an uninitialized-memory-read error in your printf() call that happens afterwards. Just a guess. – Jeremy Friesner Jun 27 '12 at 04:22
  • @JeremyFriesner, testing for return values is **not necessary** unless one has changed the error handler for the communicator. The default standard error handler aborts the application if an operation returns something other than `MPI_SUCCESS` (not valid for MPI I/O operations though). – Hristo Iliev Jun 27 '12 at 10:41
  • This is not a question suitable for SO. Seems like Valgrind detects something in the OPAL library of Open MPI (could be a false positive). You should address the problem to the Open MPI [users list](http://www.open-mpi.org/community/lists/ompi.php) or open a ticket in their [trac system](https://svn.open-mpi.org/trac/ompi/) if you think it's a bug. Besides it happens deep within `PMPI_Init` which implements the `MPI_Init()` operation. It has nothing to do with the rest of your code. – Hristo Iliev Jun 27 '12 at 10:46

1 Answers1

1

It's quite possible that our suppression file is not up-to-date in OMPI v1.6. :-\

You should report this on the OMPI mailing list. See http://www.open-mpi.org/community/lists/ompi.php.

Jeff Squyres
  • 744
  • 4
  • 6
  • Could this still be the case? I have to generate a more comprehensive suppression files to suppress all the errors, even in the most minimal mpi program (initialize then finalize). – alfC Jul 22 '20 at 02:02