0

I am trying to implement Strassen matrix multiplication algorithm in both sequential and parallel. I want the following code to run parallel but I have no experience with parallel programming.

int[][] M1 = multiply(add(A11, A22), add(B11, B22));
int[][] M2 = multiply(add(A21, A22), B11);
int[][] M3 = multiply(A11, sub(B12, B22));
int[][] M4 = multiply(A22, sub(B21, B11));
int[][] M5 = multiply(add(A11, A12), B22);
int[][] M6 = multiply(sub(A21, A11), add(B11, B12));
int[][] M7 = multiply(sub(A12, A22), add(B21, B22));

I searched through the web and found a possible solution and arranged the code again:

ExecutorService executor = Executors.newCachedThreadPool();
List<FutureTask<int[][]>> taskList1 = new ArrayList<FutureTask<int[][]>>();
// Start thread for the first half of the numbers
FutureTask<int[][]> futureTask_2 = new FutureTask<int[][]>(new Callable<int[][]>() {
    @Override
    public int[][] call() throws InterruptedException, ExecutionException {
        return multiply(add(A11, A22), add(B11, B22));
    }
});
FutureTask<int[][]> futureTask_3 = new FutureTask<int[][]>(new Callable<int[][]>() {
    @Override
    public int[][] call() throws InterruptedException, ExecutionException {
        return multiply(add(A21, A22), B11);
    }
});
FutureTask<int[][]> futureTask_4 = new FutureTask<int[][]>(new Callable<int[][]>() {
    @Override
    public int[][] call() throws InterruptedException, ExecutionException {
        return multiply(A11, sub(B12, B22));
    }
});
FutureTask<int[][]> futureTask_5 = new FutureTask<int[][]>(new Callable<int[][]>() {
    @Override
    public int[][] call() throws InterruptedException, ExecutionException {
        return multiply(A22, sub(B21, B11));
    }
});
FutureTask<int[][]> futureTask_6 = new FutureTask<int[][]>(new Callable<int[][]>() {
    @Override
    public int[][] call() throws InterruptedException, ExecutionException {
        return multiply(add(A11, A12), B22);
    }
});
FutureTask<int[][]> futureTask_7 = new FutureTask<int[][]>(new Callable<int[][]>() {
    @Override
    public int[][] call() throws InterruptedException, ExecutionException {
        return multiply(sub(A21, A11), add(B11, B12));
    }
});
FutureTask<int[][]> futureTask_8 = new FutureTask<int[][]>(new Callable<int[][]>() {
    @Override
    public int[][] call() throws InterruptedException, ExecutionException {
        return multiply(sub(A12, A22), add(B21, B22));
    }
});
taskList1.add(futureTask_2);
taskList1.add(futureTask_3);
taskList1.add(futureTask_4);
taskList1.add(futureTask_5);
taskList1.add(futureTask_6);
taskList1.add(futureTask_7);
taskList1.add(futureTask_8);
executor.execute(futureTask_2);
executor.execute(futureTask_3);
executor.execute(futureTask_4);
executor.execute(futureTask_5);
executor.execute(futureTask_6);
executor.execute(futureTask_7);
executor.execute(futureTask_8);

FutureTask<int[][]> ftrTask = taskList1.get(0);
final int[][] M1 = ftrTask.get();
FutureTask<int[][]> ftrTask1 = taskList1.get(1);
final int[][] M2 = ftrTask1.get();
FutureTask<int[][]> ftrTask2 = taskList1.get(2);
final int[][] M3 = ftrTask2.get();
FutureTask<int[][]> ftrTask3 = taskList1.get(3);
final int[][] M4 = ftrTask3.get();
FutureTask<int[][]> ftrTask4 = taskList1.get(4);
final int[][] M5 = ftrTask4.get();
FutureTask<int[][]> ftrTask5 = taskList1.get(5);
final int[][] M6 = ftrTask5.get();
FutureTask<int[][]> ftrTask6 = taskList1.get(6);
final int[][] M7 = ftrTask6.get();

executor.shutdown();

When I run the program for small number of array dimensions like 2,4,8,16 it works almost the same amount of time as in sequential version. For large dimensions like 100, 1000, it calculates the result in much longer time then the sequential version. Is my parallel implementation wrong?

Community
  • 1
  • 1
user997248
  • 73
  • 2
  • 10
  • Naive matrix multiplication will most likely be faster run in parallel than Strassen's, Strassen's is best algo for single thread. :) – oz10 May 08 '14 at 21:35
  • so are you saying that parallel implementation of naive matrix multiplication is faster than parallel implementation of strassen? :) – user997248 May 08 '14 at 21:37
  • Yes. YMMV depending on the execution environment. I don't want to discourage you from trying it, but I would put $20 I'm right. :) – oz10 May 08 '14 at 21:39
  • My experience is with distributed cluster. If you're running on same box, this may not be the case. The crux of the issue is data dependencies... believe there are a lot of them with Strassen, which means you'll need to communicate intermediate results - if you're on a distributed cluster that means network IO, if you're on multicore CPU that means cache coherence conflicts, etc. – oz10 May 08 '14 at 21:41
  • Are you just looking for a fast way to do parallel matrix multiplication in Java? Would a pure Java framework be of any interest to you? – edharned May 08 '14 at 22:23
  • Yes actually I want to learn how the top above code can be implemented so that it runs in parallel. – user997248 May 09 '14 at 11:26
  • I maintain an open-source project on SourceForge that has a built-in-function for matrix mult. You can use it as is or roll your own. See here: http://sourceforge.net/projects/tymeacdse/ – edharned May 09 '14 at 13:58

0 Answers0