0

We have a ThreadPoolExecutor-back task execution module. Some tasks are apparently finished in the application codes but the worker code somehow skip all the try/catch/finally and just go pick up the next task, causing the previous one to miss important status reporting code.

Like in this example, 257832:5 was picked up by thread-8 but this thread eventually just start another task.

2012-07-11 15:53:39,389 INFO  [pool-1-thread-6]: Task (258861:5) : Starting.
2012-07-11 15:53:39,389 INFO  [pool-1-thread-6]: A task running ### logged by app logic
2012-07-11 15:54:18,186 INFO  [pool-1-thread-6]: Task (258868:5) : Starting. ### pick up another suddenly!
2012-07-11 15:54:18,186 INFO  [pool-1-thread-6]: A task running ### logged by app logic
2012-07-11 15:54:18,445 INFO  [pool-1-thread-6]: Task (258868:5) : returned from Task.doWork.
2012-07-11 15:54:18,446 INFO  [pool-1-thread-6]: Task (258868:5) : finalizing.
2012-07-11 15:54:18,487 INFO  [pool-1-thread-6]: Task (258868:5) : notifying status result: 200.
2012-07-11 15:54:18,487 INFO  [pool-1-thread-6]: Task (258868:5) : Finished

The Runnable for the ThreadPoolExecutor) looks fine. It looks like

public void run() {
    log.info(String.format("Task (%s:%s) : Starting.", wfId, taskId));
    try {
        Object result = task.doWork(...); // call the application codes
        log.info(String.format("Task (%s:%s) : returned from Task.doWork.", wfId, taskId));
        status = "DONE";
    } catch (Throwable t) {
        log.error("Error executing task: " + taskId + " - " + className);
    } finally {
        log.info(String.format("Task (%s:%s) : finalizing.", wfId, taskId));
        // notify status to a server
        log.info(String.format("Task (%s:%s) : Finished", wfId, taskId));
    }
}
    // the task framework looks like
    // Use a queue size larger than numThreads, since the ThreadPoolExecutor#getActiveCount() call only returns an approximate count
    this.executor = new ThreadPoolExecutor(numThreads, numThreads, 60, TimeUnit.SECONDS, new ArrayBlockingQueue<Runnable>(numThreads * 2));
    executor.execute(new TaskRunnable(...));

Note:

  1. we catch the ultimate Throwable but there is no exception logged between the two tasks
  2. there is no sign that exit() is called or program restart
  3. the first one skipped all the log lines, whether the one right after the call to the app code or the ones in the catch and finally blocks, and skip the status reporting codes
  4. this only happens randomly with low probability; but due to the large number of tasks executed, it still causes us a lot of headache.

It happens as if the thread pool executor simply evicts the running runnable magically (else it would require an InterruptedException which would be caught as a Throwable; Java threads are not to be stopped non-cooperatively except during a shutdown/exit) and skip all the blocks. I checked the ThreeadPoolExecutor javadoc, nothing should cause such event to happen.

What could have happened?

user207421
  • 305,947
  • 44
  • 307
  • 483
teddy
  • 2,158
  • 4
  • 26
  • 34
  • 3
    If it skipped *all* the log lines, including the one before the beginning of the try block, then it's possible the first logging call threw an exception, and since that line is not within the try/catch/finally then the exception would terminate the run method and the thread immediately. – David Conrad Jul 12 '12 at 01:04
  • but as seen in the log, the first line and the first line within the app logic are logged. – teddy Jul 12 '12 at 01:31
  • What are wfId and taskId? Where do they come from, and how are they set? Are they shared between threads? If so, how are they synchronized? – David Conrad Jul 12 '12 at 01:50
  • Could task.doWork(...) be causing the new task to be run synchronously? – Judge Mental Jul 12 '12 at 02:21

1 Answers1

0

One possible explanation for the log messages from the finally block no appearing is that an exception was thrown while the finally block was executing:

  • If log was null at that point, you would get an NPE. (If log is declared as final you can eliminate this as a possible cause.)

  • Depending on what the objects are, you could get an unchecked exception in the wfId or taskId objects' toString() methods.

  • The Logger object could be broken ...


It is also possible that you are looking at the wrong source code for some reason.


Yet another theoretical possibility is that the Android platform implements Thread.destroy() to actually do something, and something has called it on the thread. If this DEPRECATED method was implemented according to the original javadoc, you'd find that finally blocks were not executed. It might also be possible to do the equivalent of this from outside the JVM. But if this is what is going on, all bets are off!!

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • The log/wfId/taskId are initialized in the constructor of the runnable and are not modified throughout the codes. wfId/taskId are int/long types. – teddy Jul 12 '12 at 17:12