0

I read manual deep-copying of Fortran derived types is possible, but the following simple test program fails at run time; program compiled cleanly with PGI v16.10. What am getting wrong ?

program Test

    implicit none

    type dt
        integer :: n
        real, dimension(:), allocatable :: xm
    end type dt

    type(dt) :: grid
    integer :: i

    grid%n = 10
    allocate(grid%xm(grid%n))

!$acc enter data copyin(grid)
!$acc enter data pcreate(grid%xm)

!$acc kernels
   do i = 1, grid%n
      grid%xm(i) = i * i
   enddo
!$acc end kernels

   print*,grid%xm

end program Test

The error I am getting is:

call to cuStreamSynchronize returned error 700: Illegal address during kernel execution
call to cuMemFreeHost returned error 700: Illegal address during kernel execution
danny
  • 1,101
  • 1
  • 12
  • 34
  • according to the documentation (PGI openacc guide, v2015 and v2017): Arrays of derived type, where the derived type contains allocatable members, have not been tested and should not be considered supported for this release. https://stackoverflow.com/questions/45233207/allocatable-arrays-in-cuda-fortran-device-data-structures#comment77460575_45233207 – Vladimir F Героям слава Jul 31 '17 at 21:43
  • It turns out that commenting out the creation of pcreate(grid%xm) makes the program run properly. Could this mean that deep copying is now supported ? – danny Aug 01 '17 at 15:10
  • *"have not been tested and should not be considered supported"*...Bit that is for arrays. You have a single variable so I don"t know, try to search in the manual. – Vladimir F Героям слава Aug 01 '17 at 22:09

1 Answers1

1

You just need to add a "present(grid)" clause on the kernels directive.

Here's an example of your program with the fix as well as a few other things like updating the data so it can be printed on the host.

% cat test.f90
program Test

    implicit none

    type dt
        integer :: n
        real, dimension(:), allocatable :: xm
    end type dt

    type(dt) :: grid
    integer :: i

    grid%n = 10
    allocate(grid%xm(grid%n))

!$acc enter data copyin(grid)
!$acc enter data create(grid%xm)
!$acc kernels present(grid)
   do i = 1, grid%n
      grid%xm(i) = i * i
   enddo
!$acc end kernels
!$acc update host(grid%xm)
   print*,grid%xm

!$acc exit data delete(grid%xm, grid)
   deallocate(grid%xm)

end program Test

% pgf90 -acc test.f90 -Minfo=accel -ta=tesla -V16.10; a.out
test:
     16, Generating enter data copyin(grid)
     17, Generating enter data create(grid%xm(:))
     18, Generating present(grid)
     19, Loop is parallelizable
         Accelerator kernel generated
         Generating Tesla code
         19, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
     23, Generating update self(grid%xm(:))
    1.000000        4.000000        9.000000        16.00000
    25.00000        36.00000        49.00000        64.00000
    81.00000        100.0000

Note that PGI 17.7 will include beta support true deep copy in Fortran. As opposed to manual deep copy which you have above. Here's an example of using true deep copy:

% cat test_deep.f90
program Test

    implicit none

    type dt
        integer :: n
        real, dimension(:), allocatable :: xm
    end type dt

    type(dt) :: grid
    integer :: i

    grid%n = 10
    allocate(grid%xm(grid%n))

!$acc enter data copyin(grid)
!$acc kernels present(grid)
   do i = 1, grid%n
      grid%xm(i) = i * i
   enddo
!$acc end kernels
!$acc update host(grid)
   print*,grid%xm

!$acc exit data delete(grid)
   deallocate(grid%xm)

end program Test

% pgf90 -acc test_deep.f90 -Minfo=accel -ta=tesla:deepcopy -V17.7 ; a.out
test:
     16, Generating enter data copyin(grid)
     17, Generating present(grid)
     18, Loop is parallelizable
         Accelerator kernel generated
         Generating Tesla code
         18, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
     22, Generating update self(grid)
    1.000000        4.000000        9.000000        16.00000
    25.00000        36.00000        49.00000        64.00000
    81.00000        100.0000
Mat Colgrove
  • 5,441
  • 1
  • 10
  • 11
  • Mat, thanks a lot and I am going to use deep copy. Why is the present clause necessary though even in v17.7 ? Shouldn't implicit copyin/present of grid, which i expect to happen when using a kernels region without a data clause, work equally well? Thanks again! – danny Sep 07 '17 at 16:49
  • 1
    I consider it a bug since you're correct, with deep copy the implicit copy should just work. Granted deep copy is new and a beta feature so these types of issues are not unexpected. – Mat Colgrove Sep 07 '17 at 17:20