1

I am trying to implement the CARMA GEMM algorithm in Fortran found from this paper CARMA Publication. A link to the source code on Github written in C can be found here: CARMA CILK SGEMM

The algorithm uses divide-and-conquer by splitting the largest matrix dimension from m,n,k into two sub-problems that can be executed in parallel via recursion. The end condition executes GEMM on a portion of the of the original matrix. This code uses Cilk threading, but I am not interested in parallelism quite yet.

I am having an issue understanding how to implement an in-place matrix multiplication in Fortran while using the recursion. The following code is my attempt at splitting the dimension M only, which is still in the testing phase. The code compiles with ifort 16.0, but doesn't execute correctly. In the example provided, C1 and C2 from the main program will have different results. No compiler warnings or runtimes errors occur. Just the wrong final answer.

! ******************************
!         MODULE 
! ******************************
MODULE RECURSION_GEMM_MOD

IMPLICIT NONE
CONTAINS

RECURSIVE SUBROUTINE GEMM_RECURSION(m,n,k,alpha,beta,A,B,C,depth,max_depth)

integer, intent(in) :: m, n, k, depth, max_depth
real, intent(in) :: alpha, beta
real, intent(in) :: A(m,k), B(k,n)
real, intent(inout) :: C(m,n)

integer :: m_half, m_remain

! End condition for recursion - compute C <- alpha*AB + beta*C
IF( depth >= max_depth ) THEN
  CALL SGEMM('N','N', m, n, k, alpha, A, m, B, k, beta, C, m)
  RETURN
END IF

m_half = m/2
m_remain = m - m_half

! Divide GEMM into two block matrix sub-problems (top and bot)

! A_top, C_top
CALL GEMM_RECURSION(m_half,n,k,alpha,beta, A,B,C,depth+1,max_depth)

! A_bot, C_bot
CALL GEMM_RECURSION(m_remain,n,k,alpha,beta,A(m_half+1,1),B,C(m_half+1,1),depth+1,max_depth)

END SUBROUTINE GEMM_RECURSION

END MODULE RECURSION_GEMM_MOD


! **********************************
!         MAIN PROGRAM 
! **********************************
PROGRAM RECURSION_GEMM

USE RECURSION_GEMM_MOD
IMPLICIT NONE

INTEGER :: m,n,k,m_half, m_remain
INTEGER :: depth, max_depth
REAL :: C1(2,4), C2(2,4)

REAL :: A(2,3) = TRANSPOSE( RESHAPE( (/ 1.0, 2.0, 3.0,&
                                    4.0, 5.0, 6.0 /), (/ 3,2 /) ) )
REAL :: B(3,4) =  TRANSPOSE( RESHAPE( (/ 5.0, 6.0, 7.0, 8.0, &
                                     1.0, 2.0, 3.0, 4.0, &
                                     9.0, 10.0, 11.0, 12.0/), (/ 4,3 /) ) )
REAL :: C(2,4) =  TRANSPOSE( RESHAPE( (/ 1.0, 0.0, 1.0, 1.0, &
                                     0.0, 1.0, 0.0, 1.0 /), (/ 4,2 /) ) )            
REAL :: alpha = 1.0, beta = 1.0

m = 2
k = 3
n = 4

! Copy C into C1 for later
C1 = C

depth = 0
max_depth = 1

! Recursion GEMM stored in C1
CALL GEMM_RECURSION(m,n,k,alpha,beta,A,B,C1,depth,max_depth)

! Reference GEMM stored in C2
C2 = alpha*MATMUL(A,B) + beta*C

PRINT*, 'C1 = '
PRINT*, C1
PRINT*, ''

PRINT*, 'C2 = '
PRINT*, C2

END PROGRAM RECURSION_GEMM

I have been looking up every document about Fortran subroutines and recursion, but there isn't that much out there. I have checked into the using pointers, but everything I have seen says they are terrible for performance, which isn't great for GEMM. Any help is appreciated.

  • This compiles, even though you are passing scalars to arrays? But anyway if you want people to help you out a complete program with test input and output data makes them doing that much, much easier. Oh, and even if you do that without that most basic safety belt, Implicit None, I can tell you some people wont look at all. – Ian Bush Apr 06 '18 at 19:32
  • @IanBush Fortran subroutines pass the first element of an array's memory address. A and A(1,1) are both equivalent as far as the compiler is concerned. Test code is included... I already fixed my module not having implicit none. I didn't notice it wasn't there. – Brandon Groth Apr 06 '18 at 19:49
  • 1
    It is not equivalent. It is just allowed in certain cases. And not all Fortran argument passing is just using the address. There are some conditions quoted in https://stackoverflow.com/questions/41630361/passing-scalars-and-array-elements-to-a-procedure-expecting-an-array Also I just found: https://stackoverflow.com/questions/17891508/fortran-pass-a-scalar-in-a-main-routine-to-a-vector-in-a-subroutine-via-call – Vladimir F Героям слава Apr 06 '18 at 19:58
  • I don't understand your problem. What is happening? Are the results wrong? How wrong? Which debugging flags or sanitizations did you try? – Vladimir F Героям слава Apr 06 '18 at 21:44
  • @VladimirF I have updated the post: In the example provided, C1 and C2 from the main program will have different results. No compiler warnings or runtimes errors occur. Just the wrong final answer. I am going to try a pointer solution. It seems I can get around the pointer drawbacks with compiler directives via assumed_aligned, etc. – Brandon Groth Apr 06 '18 at 21:52

1 Answers1

2

The problem is that you are passing m_half, m_remain, depth and max_depth by reference and then assigning them in the routine. This changes the value in every instance of the routine.

The fix is to add the VALUE attribute to the declarations of these arguments. This will pass a writeable, anonymous copy of the value.

Steve Lionel
  • 6,972
  • 18
  • 31