1

I'm experiencing some problems which appear to be from memory handling in my code. I've managed to isolate the problem in the following example. It returns a segmentation fault (core dumped) at iteration 1024 (that's why I started the loop at 1000) when compiling it with intel but not with gfortran. If the call to the function is commented out, the call to the subroutine will keep working at least to the end of the loop. The theory is that when calling the function, gfortran is fetching the memory for Dmat from the heap, whereas intel is fetching it from the stack.

Is this the case? Is something else happening? This would mean that in order for my code to scale adequately and support intel compilation, I have to define all my procedures as subroutines and just give up on functions?

Test case:

module auxmod
implicit none
contains

function matmul3_fun( Amat, Bmat, Cmat ) result( Dmat )
    implicit none
    real*8, intent(in)  :: Amat(:,:)
    real*8, intent(in)  :: Bmat(:,:)
    real*8, intent(in)  :: Cmat(:,:)
    real*8              :: Dmat( size(Bmat,1), size(Bmat,2) )
    real*8, allocatable :: Emat(:,:)
    allocate( Emat( size(Bmat,1), size(Bmat,2) ) )
    Emat = matmul( Amat, Bmat )
    Dmat = matmul( Emat, Cmat )
end function matmul3_fun

subroutine matmul3_sub( Amat, Bmat, Cmat, Dmat)
    implicit none
    real*8, intent(in)  :: Amat(:,:)
    real*8, intent(in)  :: Bmat(:,:)
    real*8, intent(in)  :: Cmat(:,:)
    real*8, intent(out) :: Dmat( size(Bmat,1), size(Bmat,2) )
    real*8, allocatable :: Emat(:,:)
    allocate( Emat( size(Bmat,1), size(Bmat,2) ) )
    Emat = matmul( Amat, Bmat )
    Dmat = matmul( Emat, Cmat )
end subroutine matmul3_sub

end module auxmod


program size_overflow

use  auxmod
implicit none
real*8, allocatable :: Amat(:,:)
real*8, allocatable :: Bmat(:,:)
real*8, allocatable :: Cmat(:,:)
real*8, allocatable :: Dmat(:,:)
integer :: kk

do kk=1000, 2000
    allocate ( Amat(kk,kk), Bmat(kk,kk), Cmat(kk,kk), Dmat(kk,kk) )
    Amat(:,:) = 1.d0
    Bmat(:,:) = 2.d0
    Cmat(:,:) = 3.d0

    call matmul3_sub( Amat, Bmat, Cmat, Dmat )
    write(*,*) "SUB works for size = ", kk

    Dmat = matmul3_fun( Amat, Bmat, Cmat )
    write(*,*) "FUN works for size = ", kk

    deallocate( Amat, Bmat, Cmat, Dmat )
end do

end program size_overflow
Nordico
  • 1,226
  • 2
  • 15
  • 31
  • A clear stack overflow, probably a duplicate. `-heap-arrays` should help. – Vladimir F Героям слава Sep 20 '18 at 18:27
  • Also related, but not duplicate https://stackoverflow.com/questions/24474371/passing-a-noncontiguous-array-section-in-fortran – Vladimir F Героям слава Sep 20 '18 at 18:32
  • 1
    You can also make stack bigger, use unlimited stack (it is in the links above) or allocate Dmat manually as allocatable. Note that heap (and therefore also allocatable) arrays are slower to allocate and deallocate. – Vladimir F Героям слава Sep 20 '18 at 18:33
  • Yeah, I kind of already know most of the info provided in the links (that's kind of what I said at the end of the first paragraph and why I'm using an explicit Emat as intermediate). I'm more interested in the last option you gave: what do you mean allocating manually Dmat? Can I do that with the return value of a function? It shouldn't be a problem in this case, but if I later assign it to a matrix that is no allocatable, would that be a problem? (as in `non_alloc_mat = matmul3_fun(...)` – Nordico Sep 20 '18 at 19:11
  • Manually allocating means just making it allocatable and allocate it using the allocate statement or by assignment. – Vladimir F Героям слава Sep 20 '18 at 20:04
  • Ok, I tested that and it seems to be working. Now it would seem that subroutines give you better/easiert control over how you use memory than functions and make them better suited for a high performance code. Is this the case? I'm still very tempted to just throw out functions and work with subroutines only in view of this. – Nordico Sep 20 '18 at 20:56
  • Example: If I'm making a matrix multiplication, if I do it as a function and I want it to take whatever size no matter how its compiled, I need to have it always create a temp heap array. Whereas if it was a function, I would have to manually create temp variables in some cases, but I could call the same function with small ones in the stack or allocatable big ones. And moreover there would be many cases where no temp would need to be created. – Nordico Sep 20 '18 at 21:01
  • 1
    Sorry, I missed one thing here. The compiler should be optimize a temporary array for the array result so it should not have to be allocatable. – Vladimir F Героям слава Sep 20 '18 at 21:11
  • 1
    It is clearly this kind of crash https://stackoverflow.com/questions/12167549/program-crash-for-array-copy-with-ifort but I wonder why doesn't compiler optimize the temporary away. But `-heap-arrays` does help. – Vladimir F Героям слава Sep 20 '18 at 21:26
  • 1
    No idea. I am wondering how much of this is a compiler issue which should be solved with correct compilation options (i.e. default to always use `-heap-arrays N` when compiling with ifort), or if this is a coding issue where either (a) I should bite the performance bullet and explicitly set it as allocatable or (b) default to using subroutines (unless there is a VERY good reason for using functions) in low level, performance seeking code. Right now I'm leaning towards (b). – Nordico Sep 20 '18 at 21:39

0 Answers0