How to parallelize the nested do loops with a counter variable

Question

 jx=1 (Counter variable)
    ! squeeze out zeros from k
    
    !$omp parallel do
    do ix=1,tdof
    do iy=1,tdof
       if (iy.ge.ix) then ! Upper triangular elements
         if ((abs(K(ix,iy))-0.d0)>1e-10) then ! Non zero entries
              irndum(jx)=ix
              jcndum(jx)=iy
              Adum(jx)=K(ix,iy)
              jx=jx+1
         endif
       endif
    enddo
    enddo
!$omp end parallel do

Hi all, in the above code is filtering the non-zero elements in the upper triangular portion of the matrix. The value of tdof is large (40k) so i want to parallelize using openmp. Any suggestions on how to categorize the variables as private or shared especially the jx variable. the arrays irndum,jcndum,Adum are allocatable arrays and I assigned some size in prior. Thanks in advance.

The problem is that this loop nest has an inherent ordering, namely the increment of jx, and loops with ordering are not the best candidates for parallelisation. I'm not saying it can't be parallelised, you probably will have to make thread local lists and then merge them, but before I attempt that one important question is do you care if the parallel version produces the results in exactly the same order as the serial, or can it be the same numbers, just in a different order? — Ian Bush, Jul 16 '21 at 07:32
Then you have a difficult problem which may not parallelise well unless K is very sparse - because it is inherently not parallel. — Ian Bush, Jul 16 '21 at 08:00
Hi @IanBush, K is indeed a highly sparse matrix. I need these elements because to solve Ax=b I need A to be only nonzero elements instead of the full matrix for the solver I am using. — bharat, Jul 16 '21 at 08:14
Unrelated comment: If you just use the upper triangular part of the matrix, the second loop should be between `ix` and `tdof`. In this case you do not need the first `if`. It will improve the speed of your program. — Laci, Jul 18 '21 at 07:43
As @Ian mentioned this problem can be parallelized, but may not increase the overall speed of your program. The most efficient solution would be to do parallelization manually, this way you can make sure chunks of work distributed in the correct order. Collect the indices (`ix`,`iy` values) in separate arrays for each thread (e.g. use 2 dimensional array in which one dimension is the thread number). This way the ordering can be maintained and after that you can copy the indices to your arrays (`irndum` and `jcndum`) and fill `Adum`. I do not know fortran well-enough to provide you the code. — Laci, Jul 18 '21 at 08:40
Does this answer your question? [Sparse matrix parallel creation with openmp in fortran](https://stackoverflow.com/questions/70304763/sparse-matrix-parallel-creation-with-openmp-in-fortran) — veryreverie, Dec 10 '21 at 17:46

Victor Eijkhout · Answer 1 · 2021-12-11T01:05:01.520

This can be perfectly parallelized, but it takes a little code massaging.

Duplicate your loop. First make an array of the matrix size, compute the K function for all i,j and record how many times you need to fill an element, store that in location i. This is perfectly parallel.

Now you know how much space you need, and you can give each thread its starting point in the storage. Go through the whole loop again and actually fill in the elements, with jx local to each thread.

Ok, so this doubles the amount of scalar work. But you make it completely parallel, so you don't really care, right?

How to parallelize the nested do loops with a counter variable

1 Answers1

Linked