3

I’m dealing with multithreading in Java and, as someone pointed out to me, I noticed that threads warm up, it is, they get faster as they are repeatedly executed. I would like to understand why this happens and if it is related to Java itself or whether it is a common behavior of every multithreaded program.

The code (by Peter Lawrey) that exemplifies it is the following:

for (int i = 0; i < 20; i++) {
    ExecutorService es = Executors.newFixedThreadPool(1);
    final double[] d = new double[4 * 1024];
    Arrays.fill(d, 1);
    final double[] d2 = new double[4 * 1024];
    es.submit(new Runnable() {
    @Override
    public void run() {
        // nothing.
    }
    }).get();
    long start = System.nanoTime();
    es.submit(new Runnable() {
    @Override
    public void run() {
        synchronized (d) {
            System.arraycopy(d, 0, d2, 0, d.length);
        }
    }
    });
    es.shutdown();
    es.awaitTermination(10, TimeUnit.SECONDS);
    // get a the values in d2.
    for (double x : d2) ;
    long time = System.nanoTime() - start;
    System.out.printf("Time to pass %,d doubles to another thread and back was %,d ns.%n", d.length, time);
}

Results:

Time to pass 4,096 doubles to another thread and back was 1,098,045 ns.
Time to pass 4,096 doubles to another thread and back was 171,949 ns.
 ... deleted ...
Time to pass 4,096 doubles to another thread and back was 50,566 ns.
Time to pass 4,096 doubles to another thread and back was 49,937 ns.

I.e. it gets faster and stabilises around 50 ns. Why is that?

If I run this code (20 repetitions), then execute something else (lets say postprocessing of the previous results and preparation for another mulithreading round) and later execute the same Runnable on the same ThreadPool for another 20 repetitions, it will be warmed up already, in any case?

On my program, I execute the Runnable in just one thread (actually one per processing core I have, its a CPU-intensive program), then some other serial processing alternately for many times. It doesn’t seem to get faster as the program goes. Maybe I could find a way to warm it up…

Palec
  • 12,743
  • 8
  • 69
  • 138
ursoouindio
  • 113
  • 7

2 Answers2

9

It isn't the threads that are warming up so much as the JVM.

The JVM has what's called JIT (Just In Time) compiling. As the program is running, it analyzes what's happening in the program and optimizes it on the fly. It does this by taking the byte code that the JVM runs and converting it to native code that runs faster. It can do this in a way that is optimal for your current situation, as it does this by analyzing the actual runtime behavior. This can (not always) result in great optimization. Even more so than some programs that are compiled to native code without such knowledge.

You can read a bit more at http://en.wikipedia.org/wiki/Just-in-time_compilation

You could get a similar effect on any program as code is loaded into the CPU caches, but I believe this will be a smaller difference.

rfeak
  • 8,124
  • 29
  • 28
  • Thanks for such explanation, @rfeak. But do you think the compiler will be able to optimize my program? (please read the last paragraph I added to the question) – ursoouindio Mar 04 '11 at 20:17
  • The JIT compiler can only do so much, and only affects CPU time. If your serial process involves any sort of IO, there's little that can be done by the compiler. I would suggest profiling your program to see where time is spent and then attacking the biggest bottlenecks there if you need more performance. – rfeak Mar 04 '11 at 20:47
  • actually, it doesn't have any IO. I give the initial conditions and the program runs by its own. I solve some special kind of differential equations. – ursoouindio Mar 04 '11 at 23:12
  • @usoouindio - If this is only CPU bound, the JIT compiler should show some benefit. However, I stand by my suggestion. If you need more speed, use a profiler to find the bottlenecks and attack them. – rfeak Mar 07 '11 at 15:45
1

The only reasons I see that a thread execution can end up being faster are:

  • The memory manager can reuse already allocated object space (e.g., to let heap allocations fill up the available memory until the max memory is reached - the Xmx property)

  • The working set is available in the hardware cache

  • Repeating operations might create operations the compiler can easier reorder to optimize execution

Johan Sjöberg
  • 47,929
  • 21
  • 130
  • 148
  • These reasons are independent of being Java or other language? – ursoouindio Mar 04 '11 at 20:15
  • Yes and no. Alot of programming languages does reuse object space for re-allocation, while `JIT` (Just-in-time compilation) optimizations are specific mainly to JVM/.NET languages. The harware cache though is common for all platforms. – Johan Sjöberg Mar 04 '11 at 20:18