ForkJoinPool performance Java 8 vs 11

Question

Consider the following piece of code:

package com.sarvagya;

import java.util.Arrays;
import java.util.List;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ForkJoinPool;
import java.util.stream.Collectors;

public class Streamer {
    private static final int LOOP_COUNT  = 2000;
    public static void main(String[] args){
        try{
            for(int i = 0; i < LOOP_COUNT; ++i){
                poolRunner();
                System.out.println("done loop " + i);
                try{
                    Thread.sleep(50L);
                }
                catch (Exception e){
                    System.out.println(e);
                }
            }
        }
        catch (ExecutionException | InterruptedException e){
            System.out.println(e);
        }

        // Add a delay outside the loop to make sure all daemon threads are cleared before main exits.
        try{
            Thread.sleep(10 * 60 * 1000L);
        }
        catch (Exception e){
            System.out.println(e);
        }
    }

    /**
     * poolRunner method.
     * Assume I don't have any control over this method e.g. done by some library.
     * @throws InterruptedException
     * @throws ExecutionException
     */
    private static void poolRunner() throws InterruptedException, ExecutionException {
        ForkJoinPool pool = new ForkJoinPool();
        pool.submit(() ->{
            List<Integer> numbers = Arrays.asList(1,2,3,4,5,6,7,8,9,10, 11,12,14,15,16);
            List<Integer> collect = numbers.stream()
                    .parallel()
                    .filter(xx -> xx > 5)
                    .collect(Collectors.toList());
            System.out.println(collect);
        }).get();
    }
}

In above code,poolRunner method is creating a ForkJoinPool and submitting some tasks to it. When using Java 8 and keeping LOOP_COUNT as 2000, we could see max threads created was about 3600 as seen below fig: Profiling

fig: Threads info.

All these threads goes down to almost 10 after some period of time. However, in OpenJDK 11 keeping same LOOP_COUNT is going to produce following error:

[28.822s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 4k, detached.
[28.822s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 4k, detached.
[28.822s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 4k, detached.
Exception in thread "ForkJoinPool-509-worker-5" java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
    at java.base/java.lang.Thread.start0(Native Method)
    at java.base/java.lang.Thread.start(Thread.java:803)
    at java.base/java.util.concurrent.ForkJoinPool.createWorker(ForkJoinPool.java:1329)
    at java.base/java.util.concurrent.ForkJoinPool.tryAddWorker(ForkJoinPool.java:1352)
    at java.base/java.util.concurrent.ForkJoinPool.signalWork(ForkJoinPool.java:1476)
    at java.base/java.util.concurrent.ForkJoinPool.deregisterWorker(ForkJoinPool.java:1458)
    at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:187)

It reaches max thread limit very soon. Keeping LOOP_COUNT to 500, works fine, however, these threads are cleared very very slowly and reaches plateau of about 500 threads. See the images below:

fig: Thread info in OpenJDK 11

fig: Profiling in OpenJDK 11

Threads were PARKED in JDK 8, but WAIT in JDK 11. Number of daemon threads should be reduced in Java 11 as well , however, it is slow and doesn't work as expected. Moreover, assume I don't have control over poolRunner method. Consider this method is provided by some external library.

Is this issue with OpenJDK 11 or am doing something wrong in code. Thanks.

Are you sure your testing conditions are exactly the same besides the JDK version? I see different max heap and the app name seems different from your 2 profiling snapshots — Adonis, Jan 08 '19 at 15:29
Yes same code is being checked in jdk11. IntelliJ configuration were changed to test this code — Mahadeva, Jan 08 '19 at 15:32
What about all the other settings? Same hardware, same params given to the VM? Same GC? ...etc — Adonis, Jan 08 '19 at 15:33

Holger · Answer 1 · 2023-03-13T15:53:17.730

Your code is creating a huge amount of ForkJoinPool instances and is never calling shutdown() on any pool after its use. Since in case of Java 8, nothing in the specification guarantees that the worker threads will terminate, this code could even end up with 2000 (⟨number of pools⟩) times ⟨number of cores⟩ threads.

In practice, the observed behavior stems from an undocumented idle timeout of two seconds. Note that according to the comment, the consequence of an elapsed timeout is an attempt to shrink the number of workers which is different to just terminating. So if n thread experience the timeout, not all n threads terminate but the number of threads gets reduced by one and the remaining threads may wait again. Further, the phrase “initial timeout value” already hints at it, the actual timeout gets incremented each time it happens. So it takes n * (n + 1) seconds for n idle worker thread to terminate due to this (undocumented) timeout.

Starting with Java 9, there is a configurable keepAliveTime which can be specified in a new constructor of ForkJoinPool, which also documents the default value:

keepAliveTime
the elapsed time since last use before a thread is terminated (and then later replaced if needed). For the default value, use 60, TimeUnit.SECONDS.

This documentation may mislead into thinking that now all worker thread may terminate together when being idle for keepAliveTime, but in fact, there’s still a behavior of only shrinking the pool by one at a time, though now, the time is not increasing. So now, it takes up to 60 * n seconds for n idle worker thread to terminate. Since the previous behavior was unspecified, it’s not even an incompatibility.

It must be emphasized that even with the same timeout behavior, the resulting maximum number of threads could change, as when a newer JVM with better code optimizations reduces the execution time of the actual operations (without artificial insertions of Thread.sleep(…)) it would create new threads faster while the termination still is bound to wall-clock time.

The takeaway is that you should never rely on the automatic worker thread termination when you know that a thread pool is not needed anymore. Instead, you should call shutdown() when you are done.

You may verify the behavior with the following code:

int threadNumber = 8;
ForkJoinPool pool = new ForkJoinPool(threadNumber);
// force the creation of all worker threads
pool.invokeAll(Collections.nCopies(threadNumber*2, () -> {
    Thread.sleep(500);
    return "";
}));
int oldNum = pool.getPoolSize();
System.out.println(oldNum+" threads; waiting for dying threads");
long t0 = System.nanoTime();
while(oldNum > 0) {
    while(pool.getPoolSize()==oldNum)
        LockSupport.parkNanos(TimeUnit.MILLISECONDS.toNanos(200));
    long t1 = System.nanoTime();
    oldNum = pool.getPoolSize();
    System.out.println(threadNumber-oldNum+" threads terminated after "
        +TimeUnit.NANOSECONDS.toSeconds(t1 - t0)+"s");
}

####Java 8:

8 threads; waiting for dying threads
1 threads terminated after 2s
2 threads terminated after 6s
3 threads terminated after 12s
4 threads terminated after 20s
5 threads terminated after 30s
6 threads terminated after 42s
7 threads terminated after 56s
8 threads terminated after 72s

####Java 11:

8 threads; waiting for dying threads
1 threads terminated after 60s
2 threads terminated after 120s
3 threads terminated after 180s
4 threads terminated after 240s
5 threads terminated after 300s
6 threads terminated after 360s
7 threads terminated after 420s

^{Never finished, apparently, at least one last worker thread stays alive}

Holger, thanks for this answer. I see calling `shutdown()` explicitly on pool does improves performance, but the change is not under my control. Looks like this changes needs to be done by library authors. — Mahadeva, Jan 12 '19 at 20:53

Stephen C · Answer 2 · 2019-01-08T04:16:40.617

7

You are doing this incorrectly.

In above code, I am creating a ForkJoinPool and submitting some tasks to it.

Actually, you are creating 2000 ForkJoinPool instances...

Instead of doing that, you should create a single ForkJoinPool with an amount of parallelism (i.e. number of threads) that is appropriate to the task at hand.

Creating a huge number (i.e. thousands) of threads is a really bad idea. Even if you can do it without triggering an OOME, you will be consuming a lot of stack and heap memory and placing a lot of load on the scheduler and the garbage collector ... for no real benefit.

edited Jan 08 '19 at 04:16

answered Jan 08 '19 at 04:13

Stephen C

698,415
94
811
1,216

Stephen, The problem is that I have no control over `poolRunner` method. Assume this comes from some library, which internally uses ForkJoinPool, but I will have loop over this method. – Mahadeva Jan 08 '19 at 04:14
1

Sorry, but there is no solution under those circumstances. If that is the problem, them you need to **change** the problem if you want a solution. – Stephen C Jan 08 '19 at 04:17
1

I don't know. But I don't think it matters if you do things the right way. – Stephen C Jan 08 '19 at 05:37
4

The code is not only creating a huge number of pools, it forgets to call `shutdown()` on each pool after its use, so nothing in the specification guarantees that the worker threads will terminate at all. So for a 16 core machine, it could create 32000 never dying threads, all within the specification. – Holger Jan 08 '19 at 15:53
4

There was a reason we didn't enable parallel streams to run in a specified pool, but instead always run in the common pool - to prevent exactly the sort of shoot-yourself-in-the-foot that the author has taken extreme measure to arrange here. The author of `poolRunner()` thought they were being clever by "outsmarting" the runtime, and cutting and pasting "workaround" code from stack overflow that they didn't understand. This code is terminally broken; your only option is to not call it. – Brian Goetz Jan 09 '19 at 01:54

ForkJoinPool performance Java 8 vs 11

2 Answers2

Linked