I am working with a fortran project to simulate vegetation dynamic. The code is slow so I am always on the look for ways to optimize it.
I have been reading that there exist a "rule" saying that usually 90% of the time is spent on 10% of the code. To find out these bottlenecks I have started using the intel VTune performance analyzer. The simulation analysis shows that a large amount of time is spent in specific parts of the code as shown in the images . The most time consuming part of
leaftw_derivs
is shown in the next figure.
The code referred to in the analysis is shown below.
!---- Update soil moisture and energy from transpiration/root uptake. ------------------!
if (rk4aux(ibuff)%any_resolvable) then
do k1 = klsl, mzg ! loop over extracted water
do k2=k1,mzg
if (rk4site%ntext_soil(k2) /= 13) then
!---------------------------------------------------------------------------!
! Transpiration happens only when there is some water left down to this !
! layer. !
!---------------------------------------------------------------------------!
if (rk4aux(ibuff)%avail_h2o_int(k1) > 0.d0) then
!------------------------------------------------------------------------!
! Find the contribution of layer k2 for the transpiration from !
! cohorts that reach layer k1. !
!------------------------------------------------------------------------!
ext_weight = rk4aux(ibuff)%avail_h2o_lyr(k2) / rk4aux(ibuff)%avail_h2o_int(k1)
!------------------------------------------------------------------------!
wloss_tot = 0.d0
qloss_tot = 0.d0
wvlmeloss_tot = 0.d0
qvlmeloss_tot = 0.d0
do ico=1,cpatch%ncohorts
!----- Find the loss from this cohort. -------------------------------!
wloss = rk4aux(ibuff)%extracted_water(ico,k1) * ext_weight
qloss = wloss * tl2uint8(initp%soil_tempk(k2),1.d0)
wvlmeloss = wloss * wdnsi8 * dslzi8(k2)
qvlmeloss = qloss * dslzi8(k2)
!---------------------------------------------------------------------!
!---------------------------------------------------------------------!
! Add the internal energy to the cohort. This energy will be !
! eventually lost to the canopy air space because of transpiration, !
! but we will do it in two steps so we ensure energy is conserved. !
!---------------------------------------------------------------------!
dinitp%leaf_energy(ico) = dinitp%leaf_energy(ico) + qloss
dinitp%veg_energy(ico) = dinitp%veg_energy(ico) + qloss
initp%hflx_lrsti(ico) = initp%hflx_lrsti(ico) + qloss
!---------------------------------------------------------------------!
!----- Integrate the total to be removed from this layer. ------------!
wloss_tot = wloss_tot + wloss
qloss_tot = qloss_tot + qloss
wvlmeloss_tot = wvlmeloss_tot + wvlmeloss
qvlmeloss_tot = qvlmeloss_tot + qvlmeloss
!---------------------------------------------------------------------!
end do
!------------------------------------------------------------------------!
!----- Update derivatives of water, energy, and transpiration. ----------!
dinitp%soil_water (k2) = dinitp%soil_water(k2) - wvlmeloss_tot
dinitp%soil_energy (k2) = dinitp%soil_energy(k2) - qvlmeloss_tot
dinitp%avg_transloss(k2) = dinitp%avg_transloss(k2) - wloss_tot
!------------------------------------------------------------------------!
end if
!---------------------------------------------------------------------------!
end if
!------------------------------------------------------------------------------!
end do
!---------------------------------------------------------------------------------!
end do
!------------------------------------------------------------------------------------!
end if
!---------------------------------------------------------------------------------------!
I have a very basic understanding of optimization but I don't see what could be done here to improve the code. In particular I don't understand what Instructions Retired means and how to go about it. Is there a way here to speed up computations?
EDIT
Giving it a bit more thought I realized that there are some easy optimizations here. For example moving the conditional if (rk4aux(ibuff)%avail_h2o_int(k1) > 0.d0) then
outside the loop as well as moving the tl2uint8(initp%soil_tempk(k2),1.d0)
outside the innermost loop.
However I cannot really understand the reason for the supposedly long times VTune gives: the 3 lines
dinitp%leaf_energy(ico) = dinitp%leaf_energy(ico) + qloss
dinitp%veg_energy(ico) = dinitp%veg_energy(ico) + qloss
initp%hflx_lrsti(ico) = initp%hflx_lrsti(ico) + qloss
are just performing an addition. This should be extremely fast but instead the analyzer says that a lot of time is spent there. Why would that be?
EDIT2
I rewrote the entire loop trying to optimize as much as I could. This is the code I came up with
!---- Update soil moisture and energy from transpiration/root uptake. ------------------!
if (rk4aux(ibuff)%any_resolvable) then
do k1 = klsl, mzg ! loop over extracted water
!---------------------------------------------------------------------------!
! Transpiration happens only when there is some water left down to this !
! layer. !
!---------------------------------------------------------------------------!
if (rk4aux(ibuff)%avail_h2o_int(k1) > 0.d0) then
wloss_tot_k1 = 0.d0
do ico=1,cpatch%ncohorts
!----- Integrate the total to be removed from this layer. ------------!
wloss_tot_k1 = wloss_tot_k1 + rk4aux(ibuff)%extracted_water(ico,k1)
!---------------------------------------------------------------------!
end do
!------------------------------------------------------------------------!
do k2=k1,mzg
if (rk4site%ntext_soil(k2) /= 13) then
do ico=1,cpatch%ncohorts
wloss = rk4aux(ibuff)%extracted_water(ico,k1) * ext_weight
uint_here1 = wloss * uint_here
dinitp%leaf_energy(ico) = dinitp%leaf_energy(ico) + uint_here1
dinitp%veg_energy(ico) = dinitp%veg_energy(ico) + uint_here1
initp%hflx_lrsti(ico) = initp%hflx_lrsti(ico) + uint_here1
end do
!------------------------------------------------------------------------!
wloss_tot = wloss_tot_k1 * ext_weight
wvlmeloss_tot = wloss_tot * dslzi8(k2) * wdnsi8
qvlmeloss_tot = wloss_tot * dslzi8(k2) * uint_here
!----- Update derivatives of water, energy, and transpiration. ----------!
dinitp%soil_water (k2) = dinitp%soil_water(k2) - wvlmeloss_tot
dinitp%soil_energy (k2) = dinitp%soil_energy(k2) - qvlmeloss_tot
dinitp%avg_transloss(k2) = dinitp%avg_transloss(k2) - wloss_tot
!------------------------------------------------------------------------!
end if
!---------------------------------------------------------------------------!
end do
!------------------------------------------------------------------------------!
end if
!---------------------------------------------------------------------------------!
end do
!------------------------------------------------------------------------------------!
end if
!---------------------------------------------------------------------------------------!
It's a bit long so I don't expect people to go through it. If I run the analyzer now I get considerably reduced times (from 290s to 185s, although in real simulations the speed up seems to be slightly less).
However when looking at the sampling there is still a considerable amount of time spent in operations that I would not expect to be "expensive". I still don't get what Retired instructions means and how to go about it. For the moment I think this is enough and I guess that the proper way of getting a further speed up would be to make use of openMP capability as Holmz is suggesting.