-2

I have very tedious task to optimize some ancient Fortran77 code. Honestly, I don't know fortran at all. I know how loops works and how to multiply matrices. I also know that this loop can be optimized to few 3-4 nested loops:

    do i = 1, nocca
     do j = 1, nocca
       do k = 1, noccb                                                                                                               
         do l = 1, noccc
           do m = 1, nva
             do n =1, nvb
               saps = oab(j, n+noccb)
               sbap = oab(j, k)
               sac = oac(i, l)
               scr = oac(m + nocca, l)
               im = i + nocca*(m-1)  
               kn = k + noccb*(n-1)
               imkn = im + oava*(kn-1)
               vrsab = ovovab(imkn)
   demp3 = demp3 + 2.0d0*vrsab*(2.0d0*saps*sbap*sac*scr)
             end do
           end do
         end do
       end do
     end do
   end do

I was trying to calculate sapssac in the separate loop and similarly sacscr:

c      Calculate saps * sbap 
       do j = 1, nocca
         do k  = 1, noccb
           do n = 1, nvb
             saps = oab(j, n + noccb)
             sbap = oab(j, k)   
             saps_sbap(j, k) = saps_sbap(j, k) + saps*sbap
           end do
         end do
       end do

c      Calculate sac_scr     
       do i = 1, nocca
         do l = 1, noccc
           do m = 1, nva
             sac = oac(i, l)
             scr = oac(m + nocca, l)
             sac_scr(i, l) = sac_scr(i, l) + sac*scr
           end do
         end do
       end do

Finally I would like to write the last part to calculate demp3 but there are 5 indices not 4 as I expected. Maybe I'm doing this entirely wrong?

Any suggestions? hints?

Thanks in advance!

Jan Wo
  • 29
  • 5
  • How do you know that the loops are not already optimal? – evets Feb 23 '20 at 17:31
  • 4
    A comment I would make is that you are not traversing the arrays in an optimal memory order. In Fortran, the leftmost array subscript varies fastest, yet your loops use the outermost loop for that index. This is a common mistake made by people who don't know Fortran well. Some compilers will rearrange the loops on their own, but you can help by doing so yourself. – Steve Lionel Feb 23 '20 at 17:41
  • I don't know. I don't even know what those variables are. This is part of computational program containing hundreds of such a construcions. This method is known as O(N^4) but this implementation is O(N^6). This is only one example. I need to reformat several of them but I don't know how to start. If I had one example I think I would be able to figure out the rest. The only way I'm able to check if it is working is to compile and run, then compare results of computations with the results obtained using original code. – Jan Wo Feb 23 '20 at 17:46
  • The main idea is to split those multiplications to few smaller loops. – Jan Wo Feb 23 '20 at 17:47
  • Modern Fortran compiler are good at hoisting invariant computations out of a loop. First, you need a test case that exercises the loops to ensure any change does not adversely affect the outcome. Next, you need to actually measure the execution time. Now, start moving parts of the inner loop to outer loops to (possibly) minimize the number of computations. For example, the multiplication of `2.0d0*sbap*sac` can be moved up 2 layers. – evets Feb 23 '20 at 18:09
  • BTW - this isn't Fortran77, the vintage as shown is at least Fortran90 – Ian Bush Feb 23 '20 at 18:26
  • any explanation for -1? – Jan Wo Feb 23 '20 at 19:03
  • Actually this program is written in fortran 77, I reformatted this loop to get rid of dumb number-references at the end of do statement. – Jan Wo Feb 23 '20 at 19:06

1 Answers1

0

It is clearly easier to downwote than try to understand a problem. I found optimal solution by myself. Here it is:

  1. Split large sum into two components. Let's consider only the first one:

    demp3 = demp3 + 2.0d0*vrsab*(2.0d0*saps*sbap*sac*scr)

  2. Multiply sapssbap and sacscr in two separate loops. (Dimensions can be found as maximal indices in the original loop):

    REAL*16, DIMENSION(nvb, noccb) :: saps_sbap
    REAL*16, DIMENSION(nocca, nva) :: sac_scr
    

following loops multiply sapssbap and sacscr:

c      Calculate saps * sbap 

   do k = 1, noccb
     do n = 1, nvb
       saps_sbap(n, k) = 0.0d0
       do j = 1, nocca
         saps = oab(j, n + noccb)
         sbap = oab(j, k)   
         saps_sbap(n, k) = saps_sbap(n, k) + saps*sbap
       end do
     end do
   end do 

c      Calculate sac * scr
   do i = 1, nocca
     do m = 1, nva
       sac_scr(i, m) = 0.0d0
       do l = 1, noccc
         sac = oac(i, l)
         scr = oac(m + nocca, l)
         sac_scr(i, m) = sac_scr(i, m) + sac*scr 
       end do
     end do
   end do
  1. Finally put all things together in such a loop:

    do i = 1, nocca do k = 1, noccb do m = 1, nva do n = 1, nvb im = i + nocca*(m-1) kn = k + noccb*(n-1) imkn = im +oava*(kn-1) vrsab = ovovab(imkn) demp3 = demp3 + 2.0d0 * vrsab * 2.0d0 *saps_sbap(n,k) * sac_scr(i,m) end do end do end do end do

In such a way instead of O(N^6) loop there are two O(N^3) and one O(N^4)

Sorry for the last piece format, line-end didn't worked.

Jan Wo
  • 29
  • 5