During developing a Java software, I asked here Java Concurrent - no speedUp gained LU algorithm - False sharing? why I don't have no speed up parallelizing this code using CyclicBarrier.
public void decompose(){
int n = A.length;
for(int k=0; k<n-1;k++) {
for(int i=k+1; i<n; i++) {
A[i][k] = A[i][k]/A[k][k];
for(int j=k+1; j<n; j++) {
A[i][j] = A[i][j] - A[i][k] * A[k][j];
}
}
}
decomposed = true;
}
The algorithm basically do the gaussian reduction of the matrix
After some discussion (if you interested just see the comments), an user (brettw) replied with this solution using the Fork/Join Java Framework:
public void decompose()
{
final Semaphore semaphore = new Semaphore(0);
class Decompose extends RecursiveAction {
private final int k;
Decompose(int k) {
this.k = k;
}
protected void compute() {
final int n = A.length;
for (int i = k + 1; i < n; i++) {
A[i][k] = A[i][k] / A[k][k];
for (int j = k + 1; j < n; j++) {
A[i][j] = A[i][j] - A[i][k] * A[k][j];
}
}
semaphore.release();
}
}
ForkJoinPool mainPool = new ForkJoinPool();
for (int k = 0; k < A.length - 1; k++) {
mainPool.execute(new Decompose(k));
}
semaphore.acquireUninterruptibly(A.length - 1);
}
The problem is that this algorithm doesn't produce the expected result since there is no synchronization with the worker (every row have to be update the all the value for increment the k
value).
My question is:
What kind of synchronization strategy you would suggest since I cannot foresee the number of threads/workers?