0

The following code shown is used to calculate the inverse of a matrix by the Gauss Jordan method, halving memory accesses. This improves single thread execution time. The problem I'm having is that new data dependencies are created that prevent me from parallelizing. For example, for either loop K or loop i (the loop that has the conditions if i!=k ....).

    for (k = 0; k < size; k += 2)
    {
        pivot = original[k][k];
        for (j = 0; j < size; j++)
        {
            original[k][j] /= pivot;
            inverse[k][j] /= pivot;

        }
        pivot = original[k + 1][k];
        for (i = 0; i < size; i++)
        {
            original[k + 1][i] -= original[k][i] * pivot;
            inverse[k + 1][i] -= inverse[k][i] * pivot;
        }

        pivot = original[k+1][k+1];

        for (j = 0; j < size; j++)
        {
            original[k+1][j] /= pivot;
            inverse[k+1][j] /= pivot;

        }

        for (i = 0; i < size; i++)
        {
            if (i != k && i != k + 1)
            {
                pivot = original[i][k];
                
                    for (j = 0; j < size; j++)
                    {

                        original[i][j] -= original[k][j] * pivot;
                        inverse[i][j] -= inverse[k][j] * pivot;

                    }
            }

            if (i != k + 1)
            {
                pivot = original[i][k+1];
                    for (j = 0; j < size; j++)
                    {

                        original[i][j] -= original[k + 1][j] * pivot;
                        inverse[i][j] -= inverse[k + 1][j] * pivot;

                    }
            }
        }
    }

I suppose that we will have to make transformations to the code to eliminate data dependencies. And surely the code is parallelizable

subject25
  • 1
  • 2
  • Does this [paper](https://www.researchgate.net/publication/224261066_Open_Multi_Processing_OpenMP_of_Gauss-Jordan_Method_for_Solving_System_of_Linear_Equations#fullTextFileContent) about OpenMP pipelining algorithm of Gauss-Jordan help? – Laci Jan 28 '23 at 20:04

1 Answers1

1

You need to analyze what are the essential dependencies and what is parallel. As in all linear algebra factorization algorithms, there is a sequential loop that computes the pivots. That's your k loop, and it can not be parallelized. (Maybe with some exceptional cleverness, but that probably requires a large rewrite.)

Next there is the double i,j loop, where every write access goes to a location [i][j] meaning that it is perfectly parallel. And that loop is also much larger than the single loops that come before it. It does depend on them!

So start by making the double i,j loop parallel, and see what kind of speedup you get. You can then try to make the single loops parallel, or keep them single-threaded. They may be too small to give you any sort of gain.

But no transformations of the code required: insert a single omp parallel for on that double loop and you're probably done. It may be possible to rewrite the conditional so that you can collapse that double loop.

EDIT you've coded the algorithm somewhat strangely. The update loop should only have an exception for k, the pivot row. You can rewrite it so that the k+1 case disappears.

Victor Eijkhout
  • 5,088
  • 2
  • 22
  • 23
  • Yes, the first thing I tried was to do `omp parallel for collapse(2)`, but the if conditions prevent me from doing it. The matrix size I am using is 2500 x 2500 The upper loops are small and I don't get any significant time improvement. – subject25 Jan 28 '23 at 20:37
  • @Laci Without the `collapse` clause I get a 1.25x improvement, but the return is not correct due to data dependencies – subject25 Jan 29 '23 at 09:35
  • 1
    You're right, you can not collapse those two. But the outer of the two loops ( the one on `i`) should be parallelizable. – Victor Eijkhout Jan 29 '23 at 14:24
  • @subject25 Did you make `j` and `pivot` private? – Laci Jan 29 '23 at 15:20
  • 1
    @Laci Excellent point. OP: always declare your loop variables in the loop header, not outside. (Remark aside: it seems like every other OMP question here suffers from that problem. Where do people learn this bad habit?) – Victor Eijkhout Jan 29 '23 at 16:35
  • Yes, you can also set the`j` and `pivot` variables as`private`. But the result is still wrong. – subject25 Jan 30 '23 at 10:55
  • @subject25 It should work. Can you show us a [mre]? – Laci Jan 30 '23 at 16:29
  • @Laci [link](https://prnt.sc/FYk-hxA0Kxfq). In the capture you can see the execution for a 4x4 matrix. The two groups of iterations k = 0, k+1 = 1 and K=2, k+1 = 4 (k loop unrrolling). For k = 0 and k+1 = 1, it is necessary first to do the iteration i = 0 before the iteration i = 2 and i = 3 (data dependency) For K = 2 and K + 1 = 3, you must first make i = 0 and i = 1, before i = 2, since you need the old row before editing it. This is the problem. For example, if we avoid this with 4 threads, the result would be wrong. – subject25 Jan 30 '23 at 17:25
  • Would it be solved using `pragma omp single`? – subject25 Jan 31 '23 at 09:57