4

I'm writing conjugate-gradient method realization.

I use Java multi threading for matrix back-substitution. Synchronization is made using CyclicBarrier, CountDownLatch.

Why it takes so much time to synchronize threads? Are there other ways to do it?

code snippet

private void syncThreads() {

    // barrier.await();

    try {

        barrier.await();

    } catch (InterruptedException e) {

    } catch (BrokenBarrierException e) {

    }

}
Egor Ivanov
  • 61
  • 1
  • 4
  • It all depends on how you separate work between the threads. The more independent they are, the less they synchronize, the faster your program will be. Another thing -- do you have multi core computer? – Piotr Findeisen May 31 '11 at 15:54
  • 3
    context switching is a bitch ... –  May 31 '11 at 15:55
  • @Piotr - multiple cores won't actually help unless the JVM is built to take advantage of them. – Ted Hopp May 31 '11 at 15:58
  • Synchronization takes about 2 micro-seconds. This means if you use spend less than 2 micro-second doing useful work, you are better off using 1 thread without synchronisation. – Peter Lawrey May 31 '11 at 15:58
  • @Ted, The JVM has been built to take advantage of multiple threads since version 1.0. The problem occurs when the overhead of using multi-thread is higher than the useful work done. – Peter Lawrey May 31 '11 at 15:59
  • @Peter - My point was that using multiple Java threads and using multiple cores are not at all the same thing. Many JVMs, especially in the early days, ran in one kernel thread. You could have all the Java threads you like, they all would share a single core. If you have a CPU meter that shows core loading, you can test your JVM by writing a little app that fires off several compute-bound Java threads. You might be surprised, even these days. (My Android emulator, for instance, won't load more than one core.) – Ted Hopp May 31 '11 at 16:27
  • 1
    @Ted, Java only used green threads in 1.0 version on Solaris. Support has been in the JVM on Windows/Linux all along. I didn't think of Android, but it doesn't have a JVM ;) – Peter Lawrey May 31 '11 at 16:31
  • @Peter - Android doesn't have a JVM?! No wonder my programs don't work right! :) – Ted Hopp May 31 '11 at 17:09
  • @Ted, I think its called DVM. ;) – Peter Lawrey May 31 '11 at 21:15

5 Answers5

8

You need to ensure that each thread spends more time doing useful work than it costs in overhead to pass a task to another thread.

Here is an example of where the overhead of passing a task to another thread far outweighs the benefits of using multiple threads.

final double[] results = new double[10*1000*1000];
{
    long start = System.nanoTime();
    // using a plain loop.
    for(int i=0;i<results.length;i++) {
        results[i] = (double) i * i;
    }
    long time = System.nanoTime() - start;
    System.out.printf("With one thread it took %.1f ns per square%n", (double) time / results.length);
}
{
    ExecutorService ex = Executors.newFixedThreadPool(4);
    long start = System.nanoTime();
    // using a plain loop.
    for(int i=0;i<results.length;i++) {
        final int i2 = i;
        ex.execute(new Runnable() {
            @Override
            public void run() {
                results[i2] = i2 * i2;

            }
        });
    }
    ex.shutdown();
    ex.awaitTermination(1, TimeUnit.MINUTES);
    long time = System.nanoTime() - start;
    System.out.printf("With four threads it took %.1f ns per square%n", (double) time / results.length);
}

prints

With one thread it took 1.4 ns per square
With four threads it took 715.6 ns per square

Using multiple threads is much worse.

However, increase the amount of work each thread does and

final double[] results = new double[10 * 1000 * 1000];
{
    long start = System.nanoTime();
    // using a plain loop.
    for (int i = 0; i < results.length; i++) {
        results[i] = Math.pow(i, 1.5);
    }
    long time = System.nanoTime() - start;
    System.out.printf("With one thread it took %.1f ns per pow 1.5%n", (double) time / results.length);
}
{
    int threads = 4;
    ExecutorService ex = Executors.newFixedThreadPool(threads);
    long start = System.nanoTime();
    int blockSize = results.length / threads;
    // using a plain loop.
    for (int i = 0; i < threads; i++) {
        final int istart = i * blockSize;
        final int iend = (i + 1) * blockSize;
        ex.execute(new Runnable() {
            @Override
            public void run() {
                for (int i = istart; i < iend; i++)
                    results[i] = Math.pow(i, 1.5);
            }
        });
    }
    ex.shutdown();
    ex.awaitTermination(1, TimeUnit.MINUTES);
    long time = System.nanoTime() - start;
    System.out.printf("With four threads it took %.1f ns per pow 1.5%n", (double) time / results.length);
}

prints

With one thread it took 287.6 ns per pow 1.5
With four threads it took 77.3 ns per pow 1.5

That's an almost 4x improvement.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
7

How many threads are being used in total? That is likely the source of your problem. Using multiple threads will only really give a performance boost if:

  • Each task in the thread does some sort of blocking. For example, waiting on I/O. Using multiple threads in this case enables that blocking time to be used by other threads.
  • or You have multiple cores. If you have 4 cores or 4 CPUs, you can do 4 tasks simultaneously (or 4 threads).

It sounds like you are not blocking in the threads so my guess is you are using too many threads. If you are for example using 10 different threads to do the work at the same time but only have 2 cores, that would likely be much slower than running all of the tasks in sequence. Generally start the number of threads equal to your number of cores/CPUs. Increase the threads used slowly gaging the performance each time. This will give you the optimal thread count to use.

Chris Dail
  • 25,715
  • 9
  • 65
  • 74
1

Perhaps you could try to implement to re-implement your code using fork/join from JDK 7 and see what it does?

The default creates a thread-pool with exactly the same amount of threads as you have cores in your system. If you choose the threshold for dividing your work into smaller chunks reasonably this will probably execute much more efficient.

Arjan Tijms
  • 37,782
  • 12
  • 108
  • 140
1

You are most likely aware of this, but in case you aren't, please read up on Amdahl's Law. It gives the relationship between expected speedup of a program by using parallelism and the sequential segments of the program.

Binil Thomas
  • 13,699
  • 10
  • 57
  • 70
0

synchronizing across cores is much slower than on a single cored environment see if you can limit the jvm to 1 core (see this blog post)

or you can use a ExecuterorService and use invokeAll to run the parallel tasks

ratchet freak
  • 47,288
  • 5
  • 68
  • 106