I am trying to implement the CARMA GEMM algorithm in Fortran found from this paper CARMA Publication. A link to the source code on Github written in C can be found here: CARMA CILK SGEMM
The algorithm uses divide-and-conquer by splitting the largest matrix dimension from m,n,k into two sub-problems that can be executed in parallel via recursion. The end condition executes GEMM on a portion of the of the original matrix. This code uses Cilk threading, but I am not interested in parallelism quite yet.
I am having an issue understanding how to implement an in-place matrix multiplication in Fortran while using the recursion. The following code is my attempt at splitting the dimension M only, which is still in the testing phase. The code compiles with ifort 16.0, but doesn't execute correctly. In the example provided, C1 and C2 from the main program will have different results. No compiler warnings or runtimes errors occur. Just the wrong final answer.
! ******************************
! MODULE
! ******************************
MODULE RECURSION_GEMM_MOD
IMPLICIT NONE
CONTAINS
RECURSIVE SUBROUTINE GEMM_RECURSION(m,n,k,alpha,beta,A,B,C,depth,max_depth)
integer, intent(in) :: m, n, k, depth, max_depth
real, intent(in) :: alpha, beta
real, intent(in) :: A(m,k), B(k,n)
real, intent(inout) :: C(m,n)
integer :: m_half, m_remain
! End condition for recursion - compute C <- alpha*AB + beta*C
IF( depth >= max_depth ) THEN
CALL SGEMM('N','N', m, n, k, alpha, A, m, B, k, beta, C, m)
RETURN
END IF
m_half = m/2
m_remain = m - m_half
! Divide GEMM into two block matrix sub-problems (top and bot)
! A_top, C_top
CALL GEMM_RECURSION(m_half,n,k,alpha,beta, A,B,C,depth+1,max_depth)
! A_bot, C_bot
CALL GEMM_RECURSION(m_remain,n,k,alpha,beta,A(m_half+1,1),B,C(m_half+1,1),depth+1,max_depth)
END SUBROUTINE GEMM_RECURSION
END MODULE RECURSION_GEMM_MOD
! **********************************
! MAIN PROGRAM
! **********************************
PROGRAM RECURSION_GEMM
USE RECURSION_GEMM_MOD
IMPLICIT NONE
INTEGER :: m,n,k,m_half, m_remain
INTEGER :: depth, max_depth
REAL :: C1(2,4), C2(2,4)
REAL :: A(2,3) = TRANSPOSE( RESHAPE( (/ 1.0, 2.0, 3.0,&
4.0, 5.0, 6.0 /), (/ 3,2 /) ) )
REAL :: B(3,4) = TRANSPOSE( RESHAPE( (/ 5.0, 6.0, 7.0, 8.0, &
1.0, 2.0, 3.0, 4.0, &
9.0, 10.0, 11.0, 12.0/), (/ 4,3 /) ) )
REAL :: C(2,4) = TRANSPOSE( RESHAPE( (/ 1.0, 0.0, 1.0, 1.0, &
0.0, 1.0, 0.0, 1.0 /), (/ 4,2 /) ) )
REAL :: alpha = 1.0, beta = 1.0
m = 2
k = 3
n = 4
! Copy C into C1 for later
C1 = C
depth = 0
max_depth = 1
! Recursion GEMM stored in C1
CALL GEMM_RECURSION(m,n,k,alpha,beta,A,B,C1,depth,max_depth)
! Reference GEMM stored in C2
C2 = alpha*MATMUL(A,B) + beta*C
PRINT*, 'C1 = '
PRINT*, C1
PRINT*, ''
PRINT*, 'C2 = '
PRINT*, C2
END PROGRAM RECURSION_GEMM
I have been looking up every document about Fortran subroutines and recursion, but there isn't that much out there. I have checked into the using pointers, but everything I have seen says they are terrible for performance, which isn't great for GEMM. Any help is appreciated.