The openmp matrix multiplication

Question

I try to write a Openmp based matrix multiplication code. The multiplication of matrix mm and matrix mmt is diagonal matrix and equal to one. I try normal calculation and Openmp. The normal result is correct, however the Openmp result is wrong. I think it should be relative to the Openmp utilization.

program main
implicit none
double precision,dimension(0:8,0:8)::MM,MMT,MTEMP

MM=reshape((/ 1 ,   1   ,   1   ,   1   ,   1   ,   1   ,   1   ,   1   ,   1   ,   &
-4  ,   -1  ,   -1  ,   -1  ,   -1  ,   2   ,   2   ,   2   ,   2   ,   &
4   ,   -2  ,   -2  ,   -2  ,   -2  ,   1   ,   1   ,   1   ,   1   ,   &
0   ,   1   ,   0   ,   -1  ,   0   ,   1   ,   -1  ,   -1  ,   1   ,   &
0   ,   -2  ,   0   ,   2   ,   0   ,   1   ,   -1  ,   -1  ,   1   ,   &
0   ,   0   ,   1   ,   0   ,   -1  ,   1   ,   1   ,   -1  ,   -1  ,   &
0   ,   0   ,   -2  ,   0   ,   2   ,   1   ,   1   ,   -1  ,   -1  ,   &
0   ,   1   ,   -1  ,   1   ,   -1  ,   0   ,   0   ,   0   ,   0   ,   &
0   ,   0   ,   0   ,   0   ,   0   ,   1   ,   -1  ,   1   ,   -1  /),shape(MM))



MMT=1.d0/36.d0 * reshape  ((/ 4 ,   -4  ,   4   ,   0   ,   0   ,   0   ,   0   ,   0   ,   0   ,   &
                4   ,   -1  ,   -2  ,   6   ,   -6  ,   0   ,   0   ,   9   ,   0   ,   &
                4   ,   -1  ,   -2  ,   0   ,   0   ,   6   ,   -6  ,   -9  ,   0   ,   &
                4   ,   -1  ,   -2  ,   -6  ,   6   ,   0   ,   0   ,   9   ,   0   ,   &
                4   ,   -1  ,   -2  ,   0   ,   0   ,   -6  ,   6   ,   -9  ,   0   ,   &
                4   ,   2   ,   1   ,   6   ,   3   ,   6   ,   3   ,   0   ,   9   ,   &
                4   ,   2   ,   1   ,   -6  ,   -3  ,   6   ,   3   ,   0   ,   -9  ,   &
                4   ,   2   ,   1   ,   -6  ,   -3  ,   -6  ,   -3  ,   0   ,   9   ,   &
                4   ,   2   ,   1   ,   6   ,   3   ,   -6  ,   -3  ,   0   ,   -9  /),shape(MMT))

!$OMP PARALLEL
          call multi(mm,mmt,mtemp)

                PRINT*,MTEMP
!$OMP END PARALLEL


endprogram main

subroutine multi(m1,m2,m3)
double precision,dimension(0:8,0:8)::m1,m2,m3
double precision,dimension(0:80)::mm1,mm2
DOUBLE PRECISION::TEMP
integer::i,j,k
!$OMP DO
do j=0,8
    do i=0,8
        mm1(j*9+i)=m1(i,j)
        mm2(i*9+j)=m2(i,j)
    enddo
enddo
!$OMP ENDDO
!$OMP DO PRIVATE(TEMP,I,J,K)
do j=0,8
    do i=0,8
        temp=0
        do k=0,8
            temp=temp+mm1(j*9+k)*mm2(i*9+k)
        enddo
        m3(i,j)=temp
    enddo
enddo
!$OMP ENDDO
return


endsubroutine

I give the normal version below, and if you compute this, the result is diagnonal 1 matrix.

program main
implicit none
double precision,dimension(0:8,0:8)::MM,MMT,MTEMP

MM=reshape((/ 1 ,   1   ,   1   ,   1   ,   1   ,   1   ,   1   ,   1   ,   1   ,   &
-4  ,   -1  ,   -1  ,   -1  ,   -1  ,   2   ,   2   ,   2   ,   2   ,   &
4   ,   -2  ,   -2  ,   -2  ,   -2  ,   1   ,   1   ,   1   ,   1   ,   &
0   ,   1   ,   0   ,   -1  ,   0   ,   1   ,   -1  ,   -1  ,   1   ,   &
0   ,   -2  ,   0   ,   2   ,   0   ,   1   ,   -1  ,   -1  ,   1   ,   &
0   ,   0   ,   1   ,   0   ,   -1  ,   1   ,   1   ,   -1  ,   -1  ,   &
0   ,   0   ,   -2  ,   0   ,   2   ,   1   ,   1   ,   -1  ,   -1  ,   &
0   ,   1   ,   -1  ,   1   ,   -1  ,   0   ,   0   ,   0   ,   0   ,   &
0   ,   0   ,   0   ,   0   ,   0   ,   1   ,   -1  ,   1   ,   -1  /),shape(MM))



MMT=1.d0/36.d0 * reshape  ((/ 4 ,   -4  ,   4   ,   0   ,   0   ,   0   ,   0   ,   0   ,   0   ,   &
                4   ,   -1  ,   -2  ,   6   ,   -6  ,   0   ,   0   ,   9   ,   0   ,   &
                4   ,   -1  ,   -2  ,   0   ,   0   ,   6   ,   -6  ,   -9  ,   0   ,   &
                4   ,   -1  ,   -2  ,   -6  ,   6   ,   0   ,   0   ,   9   ,   0   ,   &
                4   ,   -1  ,   -2  ,   0   ,   0   ,   -6  ,   6   ,   -9  ,   0   ,   &
                4   ,   2   ,   1   ,   6   ,   3   ,   6   ,   3   ,   0   ,   9   ,   &
                4   ,   2   ,   1   ,   -6  ,   -3  ,   6   ,   3   ,   0   ,   -9  ,   &
                4   ,   2   ,   1   ,   -6  ,   -3  ,   -6  ,   -3  ,   0   ,   9   ,   &
                4   ,   2   ,   1   ,   6   ,   3   ,   -6  ,   -3  ,   0   ,   -9  /),shape(MMT))


                call multi(mm,mmt,mtemp)

                PRINT*,MTEMP



endprogram main

subroutine multi(m1,m2,m3)
double precision,dimension(0:8,0:8)::m1,m2,m3
double precision,dimension(0:80)::mm1,mm2
DOUBLE PRECISION::TEMP
integer::i,j,k

do j=0,8
    do i=0,8
        mm1(j*9+i)=m1(i,j)
        mm2(i*9+j)=m2(i,j)
    enddo
enddo


do j=0,8
    do i=0,8
        temp=0
        do k=0,8
            temp=temp+mm1(j*9+k)*mm2(i*9+k)
        enddo
        m3(i,j)=temp
    enddo
enddo

return


  endsubroutine

Can you show us the serial code that generates the right answers so we can test any code we right duplicates them. At the very least please tell us exactly what operation you are trying to do mathematically, to me it looks like one of the matrices is transposed in the matrix multiplication. Also why are you making the code a lot more complicated by copying into 1D code? That is so 1970s, compilers have improved since then! — Ian Bush, Aug 12 '21 at 07:30
Why are you doing this? If you want an optimised, parallel matrix multiplication routine (and other linear algebra functions), your time is much better spent learning about (and using) BLAS libraries https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms. (They will also include other important optimisations such as vectorisation and cache-blocking). Plus, they are mostly free (as in beer) and optimsed for many different machines — Jim Cownie, Aug 12 '21 at 07:47
The problem is from the parallel region. When I move the omp parallel into subroutin, it works well. — Mac cchiatooo, Aug 12 '21 at 15:16
The problem is caused by you copying the data into the local and therefore private 1d arrays. Get rid of this unnecessary copying, address the matrices as 2d arrays as nature intended, and it all works. Will write proper answer tomorrow - but Jim Cownie is right, unless this is an exercise to learn something nobody should be writing matrix multiplies in Fortran, and shouldn't have been doing for 30 years+. — Ian Bush, Aug 12 '21 at 20:32

Laci · Accepted Answer · 2021-08-12T12:40:53.110

2

To solve your problem one solution is to place the parallel region inside the multi subroutine. This code gives the same result as the serial one:

subroutine multi(m1,m2,m3)
double precision,dimension(0:8,0:8)::m1,m2,m3
double precision,dimension(0:80)::mm1,mm2
DOUBLE PRECISION::TEMP
integer::i,j,k
!$OMP PARALLEL
    !$OMP DO
    do j=0,8
        do i=0,8
            mm1(j*9+i)=m1(i,j)
            mm2(i*9+j)=m2(i,j)
        enddo
    enddo
    !$OMP ENDDO
    !$OMP DO PRIVATE(TEMP,I,J,K)
    do j=0,8
        do i=0,8
            temp=0
            do k=0,8
                temp=temp+mm1(j*9+k)*mm2(i*9+k)
            enddo
            m3(i,j)=temp
        enddo
    enddo
    !$OMP ENDDO
!$OMP END PARALLEL
return

Note that the workload is very small, so it may not be faster than the serial code. Note also that if you do not need mm1 and mm2 arrays later then you do not have to calculate them:

subroutine multi(m1,m2,m3)
double precision,dimension(0:8,0:8)::m1,m2,m3
DOUBLE PRECISION::TEMP
integer::i,j,k
!$OMP PARALLEL
    !$OMP DO PRIVATE(TEMP,I,J,K)
    do j=0,8
        do i=0,8
            temp=0
            do k=0,8
                temp=temp+m1(k,j)*m2(i,k)
            enddo
            m3(i,j)=temp
        enddo
    enddo
    !$OMP ENDDO
!$OMP END PARALLEL
return

edited Aug 12 '21 at 12:40

answered Aug 12 '21 at 05:36

Laci

2,738
1
13
22

i is already private, and there is no need to push the parallel region into the subroutine - see https://stackoverflow.com/questions/35347944/fortran-openmp-with-subroutines-and-functions/35361665#35361665 – Ian Bush Aug 12 '21 at 07:27
Thank you, for the comment, I have modified my answer. – Laci Aug 12 '21 at 12:08
I think the problem is from the parallel position. When I move the parallel region into subroutine, my code works well. But, when I move the parallel region into main, it does not work. Do you know the reason? – Mac cchiatooo Aug 12 '21 at 14:41
I do not know Fortran well enough, I use OpenMP in C++, but I think it is because `double precision,dimension(0:80)::mm1,mm2` are local (private) variables and a different array is created for each thread. The second `!OMP DO` access different elements for the given thread than the first one. You have to make them global arrays to work: (e.g. use `subroutine multi(m1,m2,m3,mm1,mm2)`. – Laci Aug 13 '21 at 04:41
To be more precise the second loop access the whole `mm2` array in each thread, but it is only partially initialized. (`mm1` is OK in this respect). BTW: If you want to use the parallel region outside the subroutine, I recommend using the single construct and using tasks. – Laci Aug 13 '21 at 05:12

The openmp matrix multiplication

1 Answers1