0

I am trying to offload several nested for loops in fortran using OpenMP, XL compiler suite. 90% of the routines are straight forward, but a handful of the loops involve private 1D arrays that are of unknown size at compile time, but will always be on ~O(10), which is very manageable in terms of thread stack memory. Here is an example loop

implicit none
real, dimension(1:nseq) :: yy    !nseq is a global variable, usually 1-10, 

!$omp target teams distribute parallel do collapse(3) schedule (static,1) &
!$omp& private(i, j, k) &
!$omp& private( yy )&
!omp& shared( ne )
do k=1,30
do j=1,30
do i=1,30

 yy = dummy_array(i,j,k,6:ne)  ! nseq is equal to ne-6... ne is a global variable
                               ! dummy_array is an allocatable array that exists persistently on 
                               ! the GPU
  ...    
  do stuff with yy
  ...
end do
end do
end do

With this standard method, I get a lot of memory issues, varying between "out of memory errors" and "an illegal memory access was encountered"

If I go in and hard code what I know the values of nseq will be ahead of time, i.e.

implicit none
real, dimension(1:10) :: yy

Then I have no issues at all, so I am not ACTUALLY running out of memory on the GPU. This is obviously bad practice as these values change from case to case and are run time parameters.

I have experimented with ENV variables such as OMP_HEAPSIZE and OMP_STACKSIZE with no luck.

Thanks for taking a look!

Kschau
  • 145
  • 1
  • 12
  • Use an allocatable array instead of an automatic one (allocated on the heap not on the stack) and allocate it for each thread/team – Francois Jacq May 17 '20 at 20:32
  • @FrancoisJacq Thanks you. Yes this issue has gotten me delving into stack and heap memory and it seems like this is the right track. How can I allocate for each thread? with a Fortran allocate statement inside the loop? Or with an OMP directive at the top? Allocating with fortran does not make the code very portable. – Kschau May 18 '20 at 12:45
  • Allocate with Fortran after $omp parallel in declaring the array private. So it will be allocated once for each thread – Francois Jacq May 18 '20 at 16:39
  • I never tried teams up to now. So I don't know what is the best in that case... – Francois Jacq May 18 '20 at 16:41

1 Answers1

0

This turns out to be a compiler quirk/bug with the IBM XL compiler suite I was using.

A workaround that is not very desirable, but is effective, is to manually privitize the temporary array, i.e.

real, dimension(30,30,30,1:nseq) :: yy

!$omp target teams distribute parallel do collapse(3) schedule (static,1) &
!$omp& private(i, j, k) &
!$omp& private( yy )&
!omp& shared( ne )
do k=1,30
do j=1,30
do i=1,30

 yy(i,j,k,:) = dummy_array(i,j,k,6:ne)  

  ...    
  do stuff with yy
  ...
end do
end do
end do
Kschau
  • 145
  • 1
  • 12