I have the following subroutine, which is my stepper function in a bigger DEM program. It computes every interaction between particles i
and j
, then updates the forces. I am now trying, as a first effort, to parallelize this thick O(N^2) loop, before I try to use fancier search algorithms.
But, I can't find a way to make it work. I know it is due to variables being modified by different threads, but I don't know how to deal with that properly. One possibility would be to redefine all my arrays to be matrices, but I still don't know if this is the correct way.
Also, note that I use include for my variables since I have a lot of variables that need to be accessed and modified by many subroutines, and it felt like the easy way to make that work, I am open to suggestions.
Here is my subroutine:
subroutine stepper (tstep)
use omp_lib
implicit none
include "parameter.h"
include "CB_variables.h"
include "CB_const.h"
include "CB_bond.h"
include "CB_forcings.h"
integer :: i, j
integer, intent(in) :: tstep
! reinitialize force arrays for contact and bonds
do i = 1, n
mc(i) = 0d0
mb(i) = 0d0
fcx(i) = 0d0
fcy(i) = 0d0
fbx(i) = 0d0
fby(i) = 0d0
end do
! put yourself in the referential of the ith particle
! loop through all j particles and compute interactions
!$omp parallel do schedule(dynamic) &
!$omp private(i,j) &
!$omp reduction(+:tfx,tfy,fcx,fcy,fbx,fby,m,mc,mb)
do i = 1, n
do j = i + 1, n
! compute relative position and velocity
call rel_pos_vel (i, j)
! bond initialization
if ( tstep .eq. 1 ) then
if ( -deltan(i, j) .le. 5d-1 * r(i)) then ! can be fancier
!bond (i, j) = 1
end if
call bond_properties (i, j)
end if
! verify if two particles are colliding
if ( deltan(i,j) .gt. 0 ) then
call contact_forces (i, j)
!call bond_creation (i, j) ! to implement
! update contact force on particle i by particle j
fcx(i) = fcx(i) - fcn(i,j) * cosa(i,j)
fcy(i) = fcy(i) - fcn(i,j) * sina(i,j)
! update moment on particule i by particule j due to tangent contact
mc(i) = mc(i) - r(i) * fct(i,j) - mcc(i,j)
! Newton's third law
! update contact force on particle j by particle i
fcx(j) = fcx(j) + fcn(i,j) * cosa(i,j)
fcy(j) = fcy(j) + fcn(i,j) * sina(i,j)
! update moment on particule j by particule i due to tangent contact
mc(j) = mc(j) - r(j) * fct(i,j) + mcc(i,j)
end if
! compute forces from bonds between particle i and j
if ( bond (i, j) .eq. 1 ) then
call bond_forces (i, j)
!call bond_breaking (i, j)
! update force on particle i by particle j due to bond
fbx(i) = fbx(i) - fbn(i,j) * cosa(i,j) + &
fbt(i,j) * sina(i,j)
fby(i) = fby(i) - fbn(i,j) * sina(i,j) - &
fbt(i,j) * cosa(i,j)
! update moment on particule i by particule j to to bond
mb(i) = mb(i) - r(i) * fbt(i,j) - mbb(i, j)
! Newton's third law
! update force on particle j by particle i due to bond
fbx(j) = fbx(j) + fbn(i,j) * cosa(i,j) - &
fbt(i,j) * sina(i,j)
fby(j) = fby(j) + fbn(i,j) * sina(i,j) + &
fbt(i,j) * cosa(i,j)
! update moment on particule j by particule i to to bond
mb(j) = mb(j) - r(i) * fbt(i,j) + mbb(j, i)
end if
! compute sheltering height for particule j on particle i for air and water drag
call sheltering(i, j)
end do
! compute the total forcing from winds, currents and coriolis on particule i
call forcing (i)
!call coriolis(i)
! reinitialize total force arrays before summing everything
m(i) = 0d0
tfx(i) = 0d0
tfy(i) = 0d0
! sum all forces together on particule i
tfx(i) = fcx(i) + fbx(i) + fax(i) + fwx(i)
tfy(i) = fcy(i) + fby(i) + fay(i) + fwy(i)
! sum all moments on particule i together
m(i) = mc(i) + mb(i) + ma(i) + mw(i)
end do
!$omp end parallel do
! forces on side particles for experiments
call experiment_forces
! integration in time
call velocity
call euler
end subroutine stepper
rel_pos_vel(i, j)
computes relative positions, velocities, angles, etc. between particlesi
andj
. Everything inside is a 2D array from common blocks.bond_properties(i, j)
same thing herecontact_forces(i, j)
some local variables as well as some arrays from common blocksbond_forces(i, j)
same thingsheltering
andforcing
same
I tried to put different variables (the ones wthat are being changed tfx, tfy, m, etc.) in the private state, but without success. I don't think I understand well the reduction tag as well. Also, I know that some constant arrays like the radius r(i)
are accessed by multiple threads, but I don't know how to deal with that.
Actually, when printing the number of threads in the parallel region (using omp_get_num_thread()
), the output is 1, indicating that no parallelization is taking place.
Also, when trying to parallelize the first initialization loop as a first step towards understanding:
!$omp do private(i) schedule(dynamic) num_threads(10)
print*, omp_get_num_threads()
do i = 1, n
mc(i) = 0d0
mb(i) = 0d0
fcx(i) = 0d0
fcy(i) = 0d0
fbx(i) = 0d0
fby(i) = 0d0
end do
!$omp parallel do
it is still not working, the output of omp_get_num_threads
is 1. And the number of threads is explicitly given to be 10.
I am compiling using gfortran, with the flags -ffast-math
, -O3
, -fopenmp