0

During developing a Java software, I asked here Java Concurrent - no speedUp gained LU algorithm - False sharing? why I don't have no speed up parallelizing this code using CyclicBarrier.

       public void decompose(){
int n = A.length;
for(int k=0; k<n-1;k++) {
    for(int i=k+1; i<n; i++) {
        A[i][k] = A[i][k]/A[k][k];
        for(int j=k+1; j<n; j++) {
            A[i][j] = A[i][j] - A[i][k] * A[k][j];
        }
    }
}
decomposed = true;
}  

The algorithm basically do the gaussian reduction of the matrix

After some discussion (if you interested just see the comments), an user (brettw) replied with this solution using the Fork/Join Java Framework:

    public void decompose()
{
 final Semaphore semaphore = new Semaphore(0);

class Decompose extends RecursiveAction {
    private final int k;

    Decompose(int k) {
        this.k = k;
    }

    protected void compute() {
        final int n = A.length;
        for (int i = k + 1; i < n; i++) {
            A[i][k] = A[i][k] / A[k][k];
            for (int j = k + 1; j < n; j++) {
                A[i][j] = A[i][j] - A[i][k] * A[k][j];
            }
        }

        semaphore.release();
    }
}

ForkJoinPool mainPool = new ForkJoinPool();
for (int k = 0; k < A.length - 1; k++) {
    mainPool.execute(new Decompose(k));
}
semaphore.acquireUninterruptibly(A.length - 1);
}

The problem is that this algorithm doesn't produce the expected result since there is no synchronization with the worker (every row have to be update the all the value for increment the k value).

My question is:

What kind of synchronization strategy you would suggest since I cannot foresee the number of threads/workers?

Community
  • 1
  • 1
BlacK
  • 241
  • 7
  • 16
  • Just FYI for the future: http://english.stackexchange.com/questions/52418/is-a-software-really-never-correct – Sotirios Delimanolis Dec 06 '13 at 17:11
  • 2
    Never too late to learn more of the English language! – BlacK Dec 06 '13 at 17:17
  • I think this is not the correct way to parallelize this algorithm, because the n-th iteration requires the results (matrix changes) of the (n-1)th iteration. You should find another parallelization strategy, I suppose. – isnot2bad Dec 06 '13 at 20:20
  • Exactly, this is the problem *every row have to be update the all the value for increment the k value* but really I don't know how... I tried with an AtomicInteger shared by all the threads but seems not working so well (speedup minimum) – BlacK Dec 06 '13 at 20:34
  • Right, the obvious ways of splitting the problem don't work, because they rely on being able to split the problem into independent chunks. Parallelizing this seems quite subtle and requires some rearrangment of the algorithm. A web search for "parallel gaussian elimination" reveals some interesting papers and lectures, including this one: http://www.cs.berkeley.edu/~yelick/cs267_sp07/lectures/lecture12/lecture12_densela2_jd07.pdf – Stuart Marks Dec 07 '13 at 02:56
  • In other words, this is not a problem that can be solved by figuring out how to do synchronization better. You need to figure out how to decompose the problem in a different way so that threads can do more work independently. The presentation I linked to above seems to cover that. – Stuart Marks Dec 07 '13 at 02:57
  • As I wrote in your last question: "ForkJoin isn't magically making things faster", you need to actually split your work for that. But since you are reading and writing over all cells, that's going to be very difficult. – TwoThe Dec 08 '13 at 11:03

1 Answers1

0

You are not using the Fork-Join pool the way it was intended. You need to fragment the work into small sections. Then decompose() each section and combine the results.

compute() needs code such as:

if (work < max) {
   work left = split left half
   work right = split right half
   fork (left) 
   right.compute()
   left.join()
} 

Without splitting the work you are not going to use multiple threads. Using semaphores single threads the work and you will never see a speed up.

Look at the examples in the API for using this framework.

edharned
  • 1,884
  • 1
  • 19
  • 20
  • Yes, the algorithm should be "re-engineered" and probably the fork-join is not what I'm looking for. – BlacK Dec 13 '13 at 18:25