2

I am trying to call a subroutine in a loop. This subroutine has a local coarray. Following is the code that I am using:

! Test local coarray in procedure called in a loop.
!
program main
    use, intrinsic :: iso_fortran_env, only : input_unit, output_unit, error_unit

    implicit none

    ! Variable declaration.
    integer :: me, ti
    integer :: GHOST_WIDTH, TSTART, TSTEPS

    sync all

    ! Initialize.
    GHOST_WIDTH = 1
    TSTART = 0
    TSTEPS = 100000
    me = this_image()

    ! Iterate.
    do ti = TSTART + 1, TSTART + TSTEPS
        call Aldeal( GHOST_WIDTH )
        if ( me == 1 ) write( output_unit, * ) ti
    end do

    if ( me == 1 ) write( output_unit, * ) "All done!"

    contains
        subroutine Aldeal( width )
            integer, intent(in) :: width

            integer, allocatable, codimension[:] :: shell1_Co, shell2_Co, shell3_Co

            allocate( shell1_Co[*], shell2_Co[*], shell3_Co[*] )

            deallocate( shell1_Co, shell2_Co, shell3_Co )

            return
        end subroutine Aldeal
end program main

Right now the subroutine is not doing anything other than allocating the local coarray and deallocating it. But even while doing this, the program is throwing me the following error after some iterations:

forrtl: severe (174): SIGSEGV, segmentation fault occurred
In coarray image 1
Image              PC                Routine            Line        Source             
coarray_main       0000000000406063  Unknown               Unknown  Unknown
libpthread-2.17.s  00007F21D8B845F0  Unknown               Unknown  Unknown
libicaf.so         00007F21D90970D5  for_rtl_ICAF_CO_D     Unknown  Unknown
coarray_main       0000000000405054  main_IP_aldeal_            37  coarray_main.f90
coarray_main       0000000000404AEC  MAIN__                     23  coarray_main.f90
coarray_main       0000000000404A22  Unknown               Unknown  Unknown
libc-2.17.so       00007F21D85C5505  __libc_start_main     Unknown  Unknown
coarray_main       0000000000404929  Unknown               Unknown  Unknown

Abort(0) on node 0 (rank 0 in comm 496): application called MPI_Abort(comm=0x84000003, 0) - process 0

And the same error is repeated for other images as well.

Line 23 is call Aldeal( GHOST_WIDTH ) inside the do loop of the main program. And line 37 corresponds to deallocate( shell1_Co, shell2_Co, shell3_Co ) statement in the subroutine.

Additionally, if I remove the deallocate statement from the subroutine, it throws the same error but the line number in the error statement this time are 23 and 39. Line 39 corresponds to the end subroutine Aldeal statement.

I am not able to understand what exactly I am doing wrong. Please help.

P.S. I am using Centos 7 with Intel(R) Parallel Studio XE 2019 Update 4 for Linux.

Ombrophile
  • 613
  • 1
  • 7
  • 13
  • In cases like this you are usually much better off contacting the compiler vendor for help. I suggest posting on the Intel forums, or using your support contract. (Also fails with the beta for next release.) – francescalus Sep 22 '19 at 17:15
  • Runs to the end successfully with GNU Fortran 9.2 + OpenCoarrays 2.7.1 + Open MPI 4.0.1. – jacob Sep 22 '19 at 17:48
  • @francescalus, thank you. Earlier I posted the question in Intel forum. But that post has not been published yet. Hence I thought posting it here hoping that I might be able to get some help here instead. – Ombrophile Sep 22 '19 at 17:50
  • Right @jacob. It runs fine when compiled using gfortran. But my problem is CentOS lacks opencoarrays support. Therefore, I am stuck with Intel Fortran compiler. – Ombrophile Sep 22 '19 at 17:52
  • 1
    As a workaround (if it's suitable), you can make the coarrays local variables (with the SAVE attribute) and do any necessary bookkeeping. – francescalus Sep 22 '19 at 23:00
  • I am afraid I won't be able to add the save attribute to the local coarrays in all of the subroutines that I have. The coarrays are not just scalar variables. Most of the times they are 2D or 3D arrays. If I do so, soon I will be out of memory. Thanks for the suggestion though. – Ombrophile Sep 23 '19 at 18:16

1 Answers1

0

Observations:

If I modify the code to have a derived-type with an allocatable component and use that to create the coarray in the subroutine, the code runs a little longer but eventually aborts with an error. Following is the modification:

module mod_coarray_error
    implicit none

    type :: int_t
        integer, allocatable, dimension(:) :: var
    end type int_t

    contains
        subroutine Aldeal_type( width )
            integer, intent(in) :: width

            type(int_t), allocatable, codimension[:] :: int_t_Co

            allocate( int_t_Co[*] )

            allocate( int_t_Co%var(width) )
            sync all

            ! deallocate( int_t_Co%var )
            deallocate( int_t_Co )

            return
        end subroutine Aldeal_type
end module mod_coarray_error


program main
    use, intrinsic :: iso_fortran_env, only : input_unit, output_unit, error_unit
    use :: mod_coarray_error

    implicit none

    ! Variable declaration.
    integer :: me, ti
    integer :: GHOST_WIDTH, TSTART, TSTEPS, SAVET

    sync all

    ! Initialize.
    GHOST_WIDTH = 3
    TSTART = 0
    TSTEPS = 100000
    SAVET = 1000
    me = this_image()

    ! Iterate.
    do ti = TSTART + 1, TSTART + TSTEPS
        sync all
        call Aldeal_type( GHOST_WIDTH )
        if ( mod( ti, SAVET ) == 0 ) then
            if ( me == 1 ) write( output_unit, * ) ti
        end if
    end do

    sync all

    if ( me == 1 ) write( output_unit, * ) "All done!"
end program main

Additionally, this code runs fine till the end when compiled in Windows.

Now if I add the compiler option heap-arrays 0, the code seems to run till the end even in Linux.

I tried to increase the number of loops, ie, TSTEPS in the code to 1e7. Even then, it runs successfully till the end. But I observe the following effects:

  1. Code gets slower as loop count increases, ie, it takes more time to run from ti = 1e6 to ti = 2e6 than the time it takes to run from ti = 1 to ti = 1e6.
  2. Memory used by the program keeps on increasing, ie, each image which consumes 2GB at start of the program run, consumes 3.5GB at ti = 2e6, 4.7GB at ti = 4e6, and 6GB at ti = 6e6.
  3. Memory used by the program is relatively less when run in Windows, but it still keeps on increasing as the loop count increases. Eg each image which consumes 100MB at start, consumes 1.5GB at ti = 2e6, 2.5GB at ti = 4e6, and 3.5GB at ti = 6e6.
  4. Using the compiler option /heap-arrays0 in Windows has no effect either on the run (as it was already successfully running without it) or on the amount of memory consumed while running.
  5. The original code posted in the question still throws an error even when compiled using the above compiler option. It does not seem to run in Windows too.

Ultimately, I am still confused as to what is happening.

P.S. I posted the question in Intel forum but have not received any response yet.

Ombrophile
  • 613
  • 1
  • 7
  • 13