0

I wrote a very simple MPI program in fortran to experiment with parallel programming. All it does is add the sum of 1+2+3+...N and do that within multiple threads. It works! But here is the weird thing: It only works if i leave a certain command line output inside the code. If I remove or uncomment it I will get a segfault after recompiling. Why is that so? Is there some kind of latency involved that the sum can not be done directly after the recieving? A simple output should, in my mind, not alter the structure of the program so that a SEGFAULT suddenly occurs. I tried several combinations of N and number of threads but it seems to come down to the output. Enlighten me:-)

The said line is marked with !!!HERE

-I compile with: mpif90 mpi_test.f90 -g

-I then execute with: mpirun -n 4 ./a.out

program mpi_test
implicit none
include 'mpif.h'
!------------------------------------------------------------------------------
integer,dimension(MPI_STATUS_SIZE)  ::status
integer                 ::my_rank,mpi_size, error_mpi
integer                 ::dest,my_start,my_end,my_summ=0,i,bigsum=0,N = 50000
double precision        ::starttime,endtime
!------------------------------------------------------------------------------
call MPI_INIT(error_mpi)                ! Initialize MPI
call MPI_COMM_SIZE(MPI_COMM_WORLD, mpi_size, error_mpi) ! get no of THREADS
call MPI_COMM_RANK(MPI_COMM_WORLD, my_rank,  error_mpi) ! distribute ranks
call cpu_time(starttime)                ! call counter for benchm.
!--------------------------------START-----------------------------------------
my_start =((my_rank)*(N/mpi_size))+1    ! calculate where to start/end the 
my_end = my_start+(N/mpi_size)-1    ! summation in each thread
do i =my_start,my_end       !summ up the partial sum given to thread
    my_summ = my_summ +i
end do
!-----------------------------------------------------------------------------
if(my_rank .NE. 0 ) then    !send result to master Thread
    call MPI_SEND(my_summ,1,MPI_INT,0,5,MPI_COMM_WORLD,error_mpi)
end if
!-----------------------------------------------------------------------------

if(my_rank .EQ. 0) then
    bigsum= my_summ     !first sum part is that of master
    do i=1,mpi_size-1   !receive summation parts from threads and add
        call MPI_RECV(my_summ,1,MPI_INT,i,5,MPI_COMM_WORLD,status)
        !!! HERE
        !write(*,*)'Master received sum:',my_summ,' from ',i, 'with status:',status
        !!! HERE
        bigsum = bigsum+my_summ
    end do
end if
!-----------------------------------------------------------------------------

call MPI_BARRIER(MPI_COMM_WORLD,error_mpi)
if(my_rank .EQ. 0) then     !compare output to simple serial calculation
    call cpu_time(endtime)
    write(*,*) 'the big sum is:',bigsum, 'parallel time:', dble(endtime-starttime),'sec.'
    call cpu_time(starttime)
    bigsum = 0
    do i = 1,N
        bigsum = bigsum+i
    end do
    call cpu_time(endtime)
    write(*,*) 'the big sum is:',bigsum, 'serial time:', dble(endtime-starttime),'sec.'
end if
!-------------------------------END--------------------------------------------
call MPI_BARRIER(MPI_COMM_WORLD,error_mpi)  !wait for every thread then Finalize
call MPI_FINALIZE(error_mpi)
!------------------------------------------------------------------------------
end program

The working output (write output left inside):

Master received sum:   234381250  from      1 with status:   [...]

Master received sum:   390631250  from      2 with status:   [...]

Master received sum:   546881250  from      3 with status:   [...]

the big sum is:  1250025000 parallel time:   1.3100000000000264E-004 sec.

the big sum is:  1250025000 serial time:   1.3700000000000170E-004 sec.

And the output if write() is commented:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7F173CBA87D7
#1  0x7F173CBA8DDE
#2  0x7F173C800D3F
#3  0x7F173CF181B7
#4  0x400E84 in mpi_test at mpi_test.f90:29
  • I followed this: http://www.mpich.org/static/docs/v3.1/www3/MPI_Recv.html is it wrong? And how come it works if the write(*,*) is left inside? Oh, and thanks for replying! – user5925562 Feb 14 '16 at 17:03
  • Read further: "All MPI routines in Fortran (except for MPI_WTIME and MPI_WTICK) have an additional argument ierr at the end of the argument list." You've that argument in your `mpi_send` but not in `mpi_recv`. [I'm sure there must be a similar question asked around here, but I can't find a good one to point you to.] – francescalus Feb 14 '16 at 17:06
  • Yes, that does the trick! But how come it worked with the write output? thats some strange behaviour IMHO. – user5925562 Feb 14 '16 at 17:06
  • Possibly: http://stackoverflow.com/q/34756900. – francescalus Feb 14 '16 at 17:07
  • Essentially, by the time you introduce potential stack corruption (bad argument count) anything can happen. – francescalus Feb 14 '16 at 17:10
  • ok. thank you very much! – user5925562 Feb 14 '16 at 17:11

0 Answers0