10

I've been messing around with different strategies for thread pooling using ThreadPoolExecutor with JDK6. I have a Priority queue working but wasn't sure if I liked how the pool didn't size after keepAliveTime (what you get with an unbounded queue). So, I'm looking at a ThreadPoolExecutor using a LinkedBlockingQueue and the CallerRuns policy.

The issue I'm having with it now is that the pool ramps up, as the docs explain that it should, but after the tasks complete and the keepAliveTime comes into play getPoolSize shows the pool getting reduced to zero. The example code below should let you see the basis for my question:

public class ThreadPoolingDemo {
    private final static Logger LOGGER =
         Logger.getLogger(ThreadPoolingDemo.class.getName());

    public static void main(String[] args) throws Exception {
        LOGGER.info("MAIN THREAD:starting");
        runCallerTestPlain();   
    }

    private static void runCallerTestPlain() throws InterruptedException {
        //10 core threads, 
        //50 max pool size, 
        //100 tasks in queue, 
        //at max pool and full queue - caller runs task
        ThreadPoolExecutor tpe = new ThreadPoolExecutor(10, 50,
            5L, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(100),
            new ThreadPoolExecutor.CallerRunsPolicy());

        //dump 5000 tasks on the queue
        for (int i = 0; i < 5000; i++) {
            tpe.submit(new Runnable() {
                @Override
                public void run() {
                    //just to eat some time and give a little feedback
                    for (int j = 0; j < 20; j++) {
                        LOGGER.info("First-batch Task, looping:" + j + "["
                               + Thread.currentThread().getId() + "]");
                    }
                }
            }, null);
        }
        LOGGER.info("MAIN THREAD:!!Done queueing!!");

        //check tpe statistics forever
        while (true) {
            LOGGER.info("Active count: " + tpe.getActiveCount() + " Pool size: "
                 + tpe.getPoolSize() + " Largest Pool: " + tpe.getLargestPoolSize());
            Thread.sleep(1000);
        }
    }
}

I found an old bug that seems to be this issue but it was closed: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6458662. Could this still be present in 1.6 or am I missing something?

It looks like I Rubber Ducked this one (http://www.codinghorror.com/blog/2012/03/rubber-duck-problem-solving.html). The bug I linked above is related to this one: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6576792, where the issue seems to be resolved in 1.7 (I loaded up 1.7 and verified - fixed...). I guess my main problem was that a bug this fundamental remained for almost a decade. I spent too much time writing this up to not post it now, hope it helps someone.

Gray
  • 115,027
  • 24
  • 293
  • 354
andematt
  • 403
  • 4
  • 10

1 Answers1

7

... after the tasks complete and the keepAliveTime comes into play getPoolSize shows the pool getting reduced to zero.

So this looks to be a race condition in the ThreadPoolExecutor. I guess it is working according to design albeit not expected. In the getTask() method which the worker threads loop through to get tasks from the blocking queue, you see this code:

if (state == SHUTDOWN)  // Help drain queue
    r = workQueue.poll();
else if (poolSize > corePoolSize || allowCoreThreadTimeOut)
    r = workQueue.poll(keepAliveTime, TimeUnit.NANOSECONDS);
else
    r = workQueue.take();
if (r != null)
    return r;
if (workerCanExit()) {
    if (runState >= SHUTDOWN) // Wake up others
        interruptIdleWorkers();
    return null;
}

If the poolSize grows above the corePoolSize then if the poll times out after keepAliveTime, the code falls down to workerCanExit() since r is null. All of the threads can return true from that method since it is just testing the state of poolSize:

    mainLock.lock();
    boolean canExit;
    try {
        canExit = runState >= STOP ||
            workQueue.isEmpty() ||
            (allowCoreThreadTimeOut &&
             poolSize > Math.max(1, corePoolSize)); << test poolSize here
    } finally {
        mainLock.unlock();                         << race to workerDone() begins
    }

Once that returns true then the worker thread exits and then the poolSize is decremented. If all of the worker threads do that test at the same time then they will all exit because of the race between the testing of poolSize and the stopping of the worker when --poolSize occurs.

What surprises me is how consistent that race condition is. If you add some randomization to the sleep() inside of the run() below then you can get some core threads to not quit but I would have thought the race condition would have been harder to hit.


You can see this behavior in the following test:

@Test
public void test() throws Exception {
    int before = Thread.activeCount();
    int core = 10;
    int max = 50;
    int queueSize = 100;
    ThreadPoolExecutor tpe =
            new ThreadPoolExecutor(core, max, 1L, TimeUnit.SECONDS,
                    new LinkedBlockingQueue<Runnable>(queueSize),
                    new ThreadPoolExecutor.CallerRunsPolicy());
    tpe.allowCoreThreadTimeOut(false);
    assertEquals(0, tpe.getActiveCount());
    // if we start 1 more than can go into core or queue, poolSize goes to 0
    int startN = core + queueSize + 1;
    // if we only start jobs the core can take care of, then it won't go to 0
    // int startN = core + queueSize;
    for (int i = 0; i < startN; i++) {
        tpe.submit(new Runnable() {
            @Override
            public void run() {
                try {
                    Thread.sleep(100);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            }
        });
    }
    while (true) {
        System.out.println("active = " + tpe.getActiveCount() + ", poolSize = " + tpe.getPoolSize()
                + ", largest = " + tpe.getLargestPoolSize() + ", threads = " + (Thread.activeCount() - before));
        Thread.sleep(1000);
    }
}

If you change the sleep line inside of the run() method to something like this:

private final Random random = new Random();
...
    Thread.sleep(100 + random.nextInt(100));

This will make the race condition harder to hit so some core threads will still be around.

Gray
  • 115,027
  • 24
  • 293
  • 354
  • The OP is more interested in `getPoolSize()` and not `getActiveCount()` After completion of all tasks and the keepAliveTime expires all threads do in fact terminate though the TPE does not. – John Vint Mar 28 '12 at 17:17
  • @Gray my TPE parameters are important. This does not always fail. Using the parameters you used we will see it work as expected. The params I specified play with the rules of TPE in a unique but also realistic way (i.e. huge batch job being dumped on the queue). The main problem with your parameters is that you never trigger the situation that causes threads to be added to the pool above and beyond the core size. To exceed/add beyond core size the pool must be at core size (10) and the queue must be past it's limit (100). Try your example with `for (int i = 0; i < max * 4; i++)`. – andematt Mar 28 '12 at 18:30
  • 1
    @andematt Interesting journey. This is a race condition in `ThreadPoolExecutor`. I've edited my answer. – Gray Mar 28 '12 at 19:43
  • 1
    @Gray - thanks for exploring it with me. I found that I could consistently get core threads to stick around if I put a sleep on the main thread near the duration of keepAliveTime and then added a few tasks that would themselves sleep for a second. The additional tasks would still be active when the keepAliveTime passed and would remain. Accepted your answer as it explains the bug report I linked and answers my question. Thanks again. – andematt Mar 29 '12 at 14:38
  • 1
    @andematt Sure think dude. Sure seems to me that there is a bug here. I can think of N ways of fixing that code. Strange. – Gray Mar 29 '12 at 14:42