2

I am encountering precision problems with MPI_REDUCE() in fortran. I have tested two methods of summing double precision numbers stored on each node. The MPI_REDUCE() line I use is

call MPI_REDUCE(num,total,1,MPI_DOUBLE_PRECISION,MPI_SUM,0,MPI_COMM_WORLD,ierr)

which stores the sum of the values of "num" on each core and sends it to "total" on the root core.

The other method I use involves sends and receives

if (rank .eq. 0) total = num
do i = 1,nproc-1
    if (rank .eq. i) call MPI_SEND(num,1,MPI_DOUBLE_PRECISION,0,&
                                  100,MPI_COMM_WORLD,ierr)
    if (rank .eq. 0) then
        call MPI_RECV(num,1,MPI_DOUBLE_PRECISION,i,&
                      100,MPI_COMM_WORLD,stat,ierr)
        total = total + num
    end if
end do

The latter always gives me the same number for total, while the former produces a different value depending on the number of processors I use (It usually changes by 1x10^-5 or so). ierr is 0 in all cases. Am I doing something wrong?

Thanks

Alexander Vogt
  • 17,879
  • 13
  • 52
  • 68
DJames
  • 571
  • 3
  • 17

1 Answers1

3

Floating-point arithmetic is not strictly associative, the order in which operations are performed can have an impact on the result. While

(a+b)+c == a+(b+c)

is true for real (as in mathematical, rather than Fortran, terminology) numbers it is not (universally) true for floating-point numbers. It is not surprising, therefore, that the in-built reduction produces a result that differs from your own home-spun reduction. As you vary the number of processors you have no control over the order of individual additions in the computation; even on a fixed number of processors I wouldn't be surprised at a small difference between results of different executions of the same program. In contrast, your own reduction always does the additions in the same order.

What is the relative error of the two results ? The datum of 10^(-5) tells us only the absolute error and that doesn't allow us to conclude that your error can entirely be explained by the non-associativity of f-p arithmetic.

High Performance Mark
  • 77,191
  • 7
  • 105
  • 161
  • Thanks for the reply. Here is an example. For 4 cores, each stores one of the following numbers. -.016578650712928463 -.005729268089031345 -.012569993133665655 -.055321271877137639 My send/recv method gives a result of -9.0199183812763095E-002 The MPI_REDUCE method gives a result of -9.0199110912786068E-002 Mathematica agrees with my send/recv method. Thanks again – DJames Oct 15 '13 at 22:10
  • Not only this, but `MPI_SUM` and all other predefined MPI reduction operators are also **commutative** and therefore the arguments of the summation could be paired in any order. – Hristo Iliev Oct 16 '13 at 15:51