0

I have written a small function about multiplication of matrices in blocks. However I was wondering if there is a more efficient way


subroutine blocks(a, b, c, blocksize)
       real(kind=dp), dimension(:,:), intent(in)  :: a, b
       real(kind=dp), dimension(:,:), intent(out) :: c
       integer, intent(in) :: blocksize
       integer :: i,j,k,ii,jj,kk,n
       c = 0.0_dp 
       n = size(a,1)
   
       do jj=1,n,blocksize
        do kk=1,n,blocksize
           do ii=1,n,blocksize
            do j=jj,min(jj+blocksize-1,n)
             do k=kk,min(kk+blocksize-1,n)
              do i=ii,min(ii+blocksize-1,n)            
                c(i,j) = c(i,j) + a(i,k)*b(k,j)
              end do
             end do
            end do
          end do
        end do
   end do
   end subroutine  blocks
Alex
  • 63
  • 6
  • What *exactly* does your function do? What does "in blocks" means? What is the purpose? Note that for ordinary matrix multiplications your code will never be as good as an optimized BLAS library subroutine like DGEMM. – Vladimir F Героям слава Sep 28 '20 at 09:46
  • Unless blocksize is small BLAS will win. Otherwise it's going to depend on at least the blocksize, the hardware you are running on, the compiler you use, the compiler options you use and the efficiency of any BLAS library that is available, and the phase of the moon. The only way to know which is best of any solution is to time it. But 4 ideas that I would try are 1) compare with BLAS 2) Compare with Matmul 3) Multithread 4) Unroll manually the k loop – Ian Bush Sep 28 '20 at 09:49
  • Hello! I m trying to do the multiplication in blocks like for example in this link http://www.netlib.org/utk/papers/autoblock/node2.html. But i was wondering if there is a more efficient way to do it (with fewer loops). Basically I m trying to see the difference in time between different ways of multiplication. I tried the straightforward way, I tried BLAS and matmul as well and I m trying now to do multiplication in blocks – Alex Sep 28 '20 at 09:50
  • The link presents the algorithm as a useful exercise. I have nothing against studying and trying that to learn something. But do not expect you will beat libraries that have been developed for decades and are optimized for individual types of CPUs. Note that the BLAS that is pre-installed in most Linux distributions is not the most optimized one, it is the reference Fortran implementation. Those optimized variants are OpenBLAS, GotoBLAS, ATLAS, MKL and others... – Vladimir F Героям слава Sep 28 '20 at 10:26
  • Hello @VladimirF Yes I understand what you are saying. Perhaps i didn t make it clear what my purpose is. I was studying about different ways of multiplications and how they affect the speed. I saw the block multiplication and I was wondering if there is a more efficient way to do it with less loops for example. By efficient I do not mean with respect to other methods, but if there is a better way to write it/do it – Alex Sep 28 '20 at 10:29
  • Note that if you search around this site you will find quite a few Q/As about blocked matrix multiplication and its efficiency. Often in C or C++, though, but if you transpose the loops, it is the same thing. https://stackoverflow.com/questions/38190006/matrix-multiplication-why-non-blocked-outperforms-blocked https://stackoverflow.com/questions/47685422/best-block-size-value-for-block-matrix-matrix-multiplication – Vladimir F Героям слава Sep 28 '20 at 10:30
  • So what exactly you mean by "efficient" Written using subarrays? These shortened syntaxes are often slower. Perhaps just change the word "efficient" to "shorter" in your question, because this way it is quite confusing. But shorter is often slower. – Vladimir F Героям слава Sep 28 '20 at 10:31

0 Answers0