3

I recently working with MPI. I am still very new to MPI. But I recently find a problem when I using MPICH2. Here is my little fortran 90 program modified from Hello world program. I haven't test the c version of it but I think they should be very similar (differed by the function name and the error prameter).

I am working on Windows 7 64bit, MinGW (gcc version 4.6.2, and it is 32bit compiler) and using MPICH2 1.4.1-p1 32bit version. Here is the command that I used to compile the simple code:

gfortran hello1.f90 -g -o hello.exe -IC:\MPICH2_x86\include -LC:\MPICH2_x86\lib -lfmpich2g

And here is the simple code:

  program main
  include 'mpif.h'
  character * (MPI_MAX_PROCESSOR_NAME) processor_name
  integer myid, numprocs, namelen, rc,ierr
  integer, allocatable :: mat1(:, :, :)

  call MPI_INIT( ierr )
  call MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr )
  call MPI_COMM_SIZE( MPI_COMM_WORLD, numprocs, ierr )
  call MPI_GET_PROCESSOR_NAME(processor_name, namelen, ierr)

  allocate(mat1(-36:36, -36:36, -36:36))
  mat1(:,:,:) = 0
  call MPI_Bcast( mat1(-36, -36, -36), 389017, MPI_INT, 0, MPI_COMM_WORLD, ierr )
  call MPI_Allreduce(MPI_IN_PLACE, mat1(-36, -36, -36), 389017, MPI_INTEGER, MPI_BOR, MPI_COMM_WORLD, ierr)
  print *,"MPI_Allreduce done!!!"
  print *,"Hello World! Process ", myid, " of ", numprocs, " on ", processor_name
  call MPI_FINALIZE(rc)
  end

It can be compiled, but however it failed when running (maybe invalid memory access?). There must be some problem with MPI_Allreduce since it works fine if I remove that line. And it also works if I make the matrix smaller. I tried it on a ubuntu machine with same version MPI as well. No problem in Linux.

When I use gdb (comes with MinGW) to check (gdb hello.exe then backtrace). I got something meaningless (or seems to be for myself):

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 16316.0x4fd0]
0x01c03100 in mpich2nemesis!PMPI_Wtime ()
   from C:\Windows\system32\mpich2nemesis.dll
(gdb) backtrace
#0  0x01c03100 in mpich2nemesis!PMPI_Wtime ()
   from C:\Windows\system32\mpich2nemesis.dll
#1  0x0017be00 in ?? ()
#2  0x00000000 in ?? ()

Does this actually mean there are something wrong with the windows version MPI library? What will be the solution to make it work?

Thanks.

rene
  • 41,474
  • 78
  • 114
  • 152
FortCpp
  • 906
  • 2
  • 12
  • 28
  • that 389017 , the int count part of mpi_bcast doesn't look right to me – pyCthon Sep 03 '12 at 23:15
  • also to test if the mpi library is working , why not find a nice mpi hello world example and see if that compiles – pyCthon Sep 03 '12 at 23:50
  • @pyCthon You are probably right, I got more direct error message with openmpi: [20989] *** An error occurred in MPI_Bcast [20989] *** on communicator MPI_COMM_WORLD [20989] *** MPI_ERR_TYPE: invalid datatype [20989] *** MPI_ERRORS_ARE_FATAL (goodbye) – Vladimir F Героям слава Sep 04 '12 at 08:02
  • THis actually worked fine for me now that I was given a chance to run the code, i tested with mpif90 gcc45 4.5.4_1 so it's definitely a windows related error – pyCthon Sep 04 '12 at 16:26
  • 1
    Thanks pyCthon. As I said, the code works on a Linux machine, but fail if you use MinGW + MPICH2 1.4.1p1. I downloaded MPI from MPICH2 official site. I'll ask this question in their mail list to see if they have an answer. – FortCpp Sep 04 '12 at 18:41
  • I want to test MPI_Allreduce. Hello world code works on all platforms. So I add MPI_Allreduce into Hello world just want to test it. – FortCpp Sep 04 '12 at 18:43
  • @pyCthon -- the 389017 looks fine to me. It is `(36+36+1)**3` which is the size of the array being passed. – mgilson Sep 05 '12 at 11:26

1 Answers1

4

This might not fix your problem, but MPI_INT is not a fortran-mpi datatype. MPI_INTEGER is the corresponding datatype. Different implementations may provide MPI_INT on the fortran side, but I'm pretty sure that this is not defined by the standard. Try compiling your code with IMPLICIT NONE and see if it complains (also test if MPI_INTEGER .ne. MPI_INT). If it complains, what is happening is that MPI_INT is getting assigned some value by the compiler (or your version of MPI uses MPI_INT for some other datatype...). This may conflict with one of the pre-defined values set by MPI. Thus, it is treating your array of integers as some other type which could result in a buffer overflow which can manifest itself in all kinds of funny ways.

mgilson
  • 300,191
  • 65
  • 633
  • 696
  • Thanks a lot mgilson. Though it doesn't fix the MPI_Allreduce problem it is a good point! MPI_INTEGER and MPI_INT are different. They have different values (1275069467 and 1275069445). But even when I add implicit none, there is no error message regarding it. You are right. I should use MPI_INTEGER instead of MPI_INT. – FortCpp Sep 05 '12 at 16:59