I am trying to run a code on vortex simulations in parallel using OpenMP. These are similar to particle simulations where at each time step, the position of a vortex at the next time step has to be computed from its velocity which is determined by the positions of all the other vortices at the current time step. The vortices are deleted once they leave the domain. I compare the number of vortices at each time step for the parallel version of code with the serial version of code, and run each version multiple times.
For the serial versions, vortex counts match exactly at every time step. For the parallel case, all the runs match with the serial case for a few tens of time steps, post which, each parallel run shows a difference but remains within a 7-10% error bound with the serial case (as can be seen in the result link below). I know that this may be because of the round off errors in the parallel case owing from the difference in the order of computational steps due to distribution among the different threads, but should the error really be so high as 10%?
I have only used the reduction clause in a parallel do construct. The only parallel region in the whole code is within a function vblob()
, which is inside a module, which I call from a main code. All function calls within vblob()
are ixi()
, fxi()
are outside this module.
function vblob(blobs,xj,gj)
complex(8), intent(in) :: blobs(:,:), xj
complex(8) :: delxi, delxic, di, gvic, xi
real(8), intent(in) :: gj
real(8) :: vblob(2)
integer :: p
gvic = 0.0; delxi = 0.0; delxic = 0.0; di = 0.0; xi = 0.0
!$omp parallel do private(xi,delxic,delxi,di) shared(xj) reduction(+:gvic)
do p = 1, size(blobs,1)
xi = ixi(blobs(p,1))
delxic = xj-conjg(xi)
delxi = xj-xi
di = del*fxi(xi)
gvic = gvic + real(blobs(p,2))*1/delxic
if (abs(delxi) .gt. 1.E-4) then
gvic = gvic + (-1)*real(blobs(p,2))*1/delxi
end if
end do
!$omp end parallel do
gvic = j*gvic*fxi(xj)/(2*pi)
vblob(1) = real(gvic)
vblob(2) = -imag(gvic)
end function vblob
If the way I have constructed the parallel code is wrong, then errors should show up from the first few time steps itself, right?
(As can be seen in this result, the 'blobs' and 'sheets' are just types of vortex elements, the blue line is the total elements. P and S stand for Parallel and serial respectively and R stands for runs. THe solid plot markers are the serial code and the hollow ones are the three runs of the parallel code)
EDIT: When i change the numerical precision of my variables to real(4) instead, the divergenec in results happens at an earlier time step than the real(8) case above. SO its clearly a round off error issue.
TLDR: I want to clarify this with anyone else who has seen such a result over a range of time steps, where the parallel code matches for the first few time steps and then diverges?