Does gfortran take advantage of DO CONCURRENT?

Question

I'm currently using gfortran 4.9.2 and I was wondering if the compiler actually know hows to take advantage of the DO CONCURRENT construct (Fortran 2008). I know that the compiler "supports" it, but it is not clear what that entails. For example, if automatic parallelization is turned on (with some number of threads specified), does the compiler know how to parallelize a do concurrent loop?

Edit: As mentioned in the comment, this previous question on SO is very similar to mine, but it is from 2012, and only very recent versions of gfortran have implemented the newest features of modern Fortran, so I thought it was worth asking about the current state of the compiler in 2015.

Vectorization can be turned on, but that has nothing to do with threads, it is about the SIMD CPU instructions. — Vladimir F Героям слава, Apr 28 '15 at 19:37
See also https://gcc.gnu.org/ml/fortran/2014-02/msg00077.html — Vladimir F Героям слава, Apr 28 '15 at 19:39
You can use a plain DO loop with OpenMP to achieve the same effect. — Jeff Hammond, Apr 29 '15 at 02:38
possible duplicate of [Parallelizing fortran 2008 \`do concurrent\` systematically, possibly with openmp](http://stackoverflow.com/questions/11550432/parallelizing-fortran-2008-do-concurrent-systematically-possibly-with-openmp) — Alexander Vogt, Apr 29 '15 at 12:19
@Jeff I would not call it the same effect. You can achieve a parallel loop. Do concurrent may help the compiler the perform vectorization even in non-threaded program in certain conditions, although the referenced mail tells it will not be too common. It could also help the automatic parallelization if the parallelizer has some dependency concerns about a normal do loop. A strange thing is that you cannot mix OpenMP and do concurrent, so I normally just use OpenMP and normal do loops. — Vladimir F Героям слава, Apr 29 '15 at 13:06
Then add the OpenMP 4 "simd" keyword to your loop as well... — Jeff Hammond, Apr 29 '15 at 14:49
Also, I'd like to see evidence that `do concurrent` enables better compiler auto-vectorization than `do` with OpenMP `for`. I have spent quite a bit of time studying compiler autovectorization and `do concurrent` does not solve the issue of alignment. Fortran semantics already ensures anti-aliasing and OpenMP `for` implies loop independence, so what more do you think you are getting from `do concurrent`? — Jeff Hammond, Apr 29 '15 at 23:14
I was thinking one gets more from do concurrent I would be using that, but I use OpenMP. I am just saying it is not the same. In particular, one can use it in a program which uses automatic parallelization in other parts and is not ported to OpenMP. — Vladimir F Героям слава, Apr 29 '15 at 23:23
Thanks for all the comments. I know that one can use OpenMP; I was just wondering if there had been any work on actually implementing something that takes advantage of the "do concurrent" flag. I guess it sounds like it is something for future-proofing, rather than something that people use currently. — Christopher A. Wong, Apr 29 '15 at 23:27

score 11 · Answer 1 · answered Sep 17 '19 at 22:21

Rather than explicitly enabling some new functionality, DO CONCURRENT in gfortran seems to put restrictions on the programmer in order to implicitly allow parallelization of the loop when required (using the option -ftree-parallelize-loops=NPROC).

While a DO loop can contain any function call, the content of DO CONCURRENT is restricted to PURE functions (i.e., having no side effects). So when one attempts to use, e.g., RANDOM_NUMBER (which is not PURE as it needs to maintain the state of the generator) in DO CONCURRENT, gfortran will protest:

prog.f90:25:29:

   25 |         call random_number(x)
      |                             1
Error: Subroutine call to intrinsic ‘random_number’ in DO CONCURRENT block at (1) is not PURE

Otherwise, DO CONCURRENT behaves as normal DO. It only enforces use of parallelizable code, so that -ftree-parallelize-loops=NPROC succeeds. For instance, with gfortran 9.1 and -fopenmp -Ofast -ftree-parallelize-loops=4, both the standard DO and the F08 DO CONCURRENT loops in the following program run in 4 threads and with virtually identical timing:

program test_do

    use omp_lib, only: omp_get_wtime

    integer, parameter :: n = 1000000, m = 10000
    real,  allocatable :: q(:)

    integer :: i
    real    :: x, t0

    allocate(q(n))

    t0 = omp_get_wtime()
    do i = 1, n
        q(i) = i
        do j = 1, m
            q(i) = 0.5 * (q(i) + i / q(i))
        end do
    end do
    print *, omp_get_wtime() - t0

    t0 = omp_get_wtime()
    do concurrent (i = 1:n)
        q(i) = i
        do j = 1, m
            q(i) = 0.5 * (q(i) + i / q(i))
        end do
    end do
    print *, omp_get_wtime() - t0

end program test_do

When I run your program with gfortran -O3 or -Ofast and -ftree-parallelize-loops=4 it often prints negative timings. Any idea what is going on? As far as I can tell omp_get_wtime() is not called inside the loops so it seems it should have returned correct and consistent timings. I do see the program using multiple processors so openmp and gfortran parallelization is working. Weird. — Ryan, Dec 17 '20 at 19:02

Does gfortran take advantage of DO CONCURRENT?

1 Answers1

Linked