2

I have this program, and would expect this to print 1 2 when run with 2 images. However, it prints 1 1 on one image, and 1 2 on the other.

program main
    implicit none
    double precision, allocatable :: a[:]

    allocate(a[*])

    a = this_image()
    sync all
    write(*, *) this_image(), a[1], a[2]

    deallocate(a)
end program main

It is compiled with gfortran -fcoarray=lib minimal.f90 -lcaf_mpich and run with mpirun.mpich -n 2 ./a.out

I am using gfortran 12.2.0, and OpenCoarrays version 2.10.1 with MPICH 4.0.2

Exact output is

           1   1.0000000000000000        1.0000000000000000
           2   1.0000000000000000        2.0000000000000000
[1689752680.508753] [thomas-laptop:14602:0]       tag_match.c:62   UCX  WARN  unexpected tag-receive descriptor 0x55b38d7fb8c0 was not matched
[1689752680.509388] [thomas-laptop:14601:0]       tag_match.c:62   UCX  WARN  unexpected tag-receive descriptor 0x564edbce58c0 was not matched
  • 1
    It's reasonable to expect the output you expect, so to start looking at whether you have inconsistencies in your toolchain can you compile with `caf` and run with `cafrun`? – francescalus Jul 19 '23 at 07:58
  • @francescalus It gives the same output. (There was a mistake in cafrun.mpich where it called the openmpi version of mpiexec on my system, which gave the output of the comment I now deleted. After the fix the output is the same.) – asdfldsfdfjjfddjf Jul 19 '23 at 08:17
  • 2
    In which case I'm afraid I can't help. I get the expected output with each of the compiler/MPI implementations I use (GCC+MPICH isn't one of them). I can suggest only trying a different setup yourself, perhaps in a container with a clean installation. – francescalus Jul 19 '23 at 09:34
  • @francescalus I didn't even think to doubt my toolchain, so that was helpful. Thank you! With openmpi it works as expected. – asdfldsfdfjjfddjf Jul 19 '23 at 13:48

1 Answers1

1

Your reasoning for the expected output is correct, and it appears that there's some mishap with the toolchain which is responsible for the incorrect result appearing. Because this is a reasonably minimal case, though, it can be educational to look at the formal statement of the result.

The program of the question consists of two segments:

  • one segment leading up to the sync all
  • one segment following the sync all

A write to a variable is visible to a read from that variable if the write "precedes" the read. In the case of the question, the write (a=this_image()) precedes the read (write(*,*) this_image(), a[1], a[2]) on each image.

"Preceding" is precisely defined in terms of segment orders, which in this case we can state as:

  • segment 1 on image 1 is ordered before segment 2 on image 1
  • segment 1 on image 2 is ordered before segment 2 on image 2
  • segment 1 on image 1 is ordered before segment 2 on image 2
  • segment 1 on image 2 is ordered before segment 2 on image 1
  • segment 1 on image 1 is unordered with respect to segment 1 on image 2
  • segment 2 on image 1 is unordered with respect to segment 2 on image 2

(with appropriate "after" ordering).

Each image defines a in its own image in segment 1; each reference to a (on any image) happens in segment 2. Segment 2 in each image is ordered after each segment 1: the definition of a on each image is "safe".

(If we remove the sync all we have just one segment, and definition of a[1] on image 1 doesn't formally precede the reference of a[1] on image 2, even though it may so happen in practice. This meaning of "precede" is what we consider in relation to data races in other contexts.)

francescalus
  • 30,576
  • 16
  • 61
  • 96