5

While analyzing the logs of one of the production environments, I saw a thread in "WAITING" state on a countdownlatch await()

...sun.misc.Unsafe.park(Native Method)
...java.util.concurrent.locks.LockSupport.park(Unknown Source)
...java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown Source)
...java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(Unknown Source)
...java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(Unknown Source)
...java.util.concurrent.CountDownLatch.await(Unknown Source)

The latch was initialized to 1 and another thread did call countDown() method on the same instance of the latch but still the main thread remains blocked on the latch. This has lead to the jvm being hung indefinitely.

Getting blocked even when the latch count reaches zero sounds unreasonable and I am looking out for suggestions on further troubleshooting this issue.

Any ideas?

Note - The jvm version used is as below

java version "1.5.0_15" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_15-b04) Java HotSpot(TM) Client VM (build 1.5.0_15-b04, mixed mode, sharing)

Update - Below is the code snippet of the thread I am talking about above

private class MyRunnable implements Runnable, Thread.UncaughtExceptionHandler {

        private AtomicBoolean shouldStop = new AtomicBoolean(false);
        private CountDownLatch stopLatch = new CountDownLatch(1);
        private Thread currentThread;

    public void run() {

        Thread.currentThread().setName("My Thread");
            Thread.currentThread().setUncaughtExceptionHandler(this);
            currentThread = Thread.currentThread();
            if (currentThread.isInterrupted()) {
                logger.debug("The pool thread had its interrupted stattus set. Clearing...");
                Thread.interrupted();
                logger.debug("The pool thread had its interrupted stattus set. Clearing...DONE");
            }
            try {
                doBusinessLogic(shouldStop);
            } catch (Exception e) {
                logger.error("An exception was encountered in the thread", e);
            } finally {
                if (currentThread.isInterrupted()) {
                    logger.debug("Clearing interupted status for the thread and returning to pool...");
                    Thread.interrupted();
                }
                stopLatch.countDown();
                logger.debug("Stopped task after counting down on the latch");
            }
        }

        public void stopThread() {
            shouldStop.set(true);
            logger.debug("Stop flag was set to true.. waiting for thread method to return...");
            try {
                stopLatch.await();
                logger.debug("Stop flag was set to true... task has finished. Returning.");
            } catch (InterruptedException e) {
                logger.error("Interrupted while awaiting thread stop event...", e);
            }
        }

        public void uncaughtException(Thread t, Throwable e) {
            logger.error("An uncaught exception occurred in the task thread ", e);
        }


        private void doBusinessLogic(AtomicBoolean shouldStop) {
    long sleepPeriod = 11;
    while (!shouldStop.get()) {
        try {
            Thread.sleep(sleepPeriod);
        } catch (InterruptedException e) {
            logger.debug("Thread was interrupted.Clearing interrupted status and proceeding", e);
            if (Thread.currentThread().isInterrupted())
                Thread.interrupted();
        }
        if (shouldStop.get()) {
            logger.debug("Stop flag was set. Returning.");
            return;
        }
        try {
            logger.debug("Performing business logic...");
            //.....
            logger.debug("Performing business logic...DONE");
        } catch (Throwable e) {
            logger.error("An exception occurred", e);
        }
        if (shouldStop.get()) {
            logger.debug("Stop flag was set. Returning.");
            return;
        }
    }

}


}

Here is what I see in the logs

DEBUG [main Thread] - Stop flag was set to true.. waiting for thread method to return...
DEBUG [My Thread]   - Stop flag was set. Returning.
DEBUG [My Thread]   - Stopped task after counting down on the latch

The logger statement after latch.await() is never printed and the thread dump also indicates that the main thread is blocked on the latch.

Andy Dufresne
  • 6,022
  • 7
  • 63
  • 113
  • 4
    You *say* that `countDown()` was called on the same instance - but I'd be more inclined to believe that there's a bug in your code and diagnosis than in the JRE libraries. Can you produce a short but complete program which demonstrates the issue? – Jon Skeet May 30 '12 at 05:45
  • If I could demonstrate it, I could have known probably known the root cause :). The reason why I say that countDown() was called is because the logger statements indicate it. There are log statements added after countDown() was called and these are seen in the logs. The thread dump though indicates that the main thread still waits on the latch count to be zero. – Andy Dufresne May 30 '12 at 06:02
  • I dare say `countDown` is called - but does your logging *also* indicate indisputably that it's called on the same instance? – Jon Skeet May 30 '12 at 06:04
  • Added the code snippet to get a clear idea. Let me know your suggestions. – Andy Dufresne May 30 '12 at 06:17
  • 2
    Do you only have one instance of this class though? Each instance has its own latch, so if you end up creating an extra instance somewhere and using that accidentally, that would explain it. – Jon Skeet May 30 '12 at 06:22
  • I analyzed the logs keeping this aspect in mind but found that the stopThread() method is always called on the right instance of the thread. The code makes sure that only one instance exists at a time. Any other loopholes? – Andy Dufresne May 30 '12 at 07:42
  • Not that I can see. Of course the fact that it's a relatively ancient build of Java means that you *may* be experiencing a bug which was fixed years ago... – Jon Skeet May 30 '12 at 08:31

1 Answers1

1

I would have liked to write this as a comment but my reputation doesn't allow commenting yet, so I'll offer my small experience here.

I run into the same issue. My countDownLatch was being accessed from synchronized methods: removing the synchronization solved the problem (while of course I had to adjust the methods' code to cope with the absence of synchronization). I've no idea how to explain this behaviour.

Raul Bertone
  • 146
  • 7
  • It's called deadlock – Enerccio Jul 13 '19 at 11:05
  • Nope, it ain't: like described in the original question, the latch count reached 0, so nothing was blocking the waiting thread. Still, it did not wake up. (as far as I can remember, it being 6 years ago...) – Raul Bertone Jul 14 '19 at 12:32
  • I am reacting to your 'My countDownLatch was being accessed from synchronized methods: removing the synchronization solved the problem (while of course I had to adjust the methods' code to cope with the absence of synchronization). I've no idea how to explain this behaviour.' This is called deadlock because whomever entered synchronized block is waiting for someone else to enter synchronized block but can't until this thread leaves.... – Enerccio Jul 14 '19 at 15:18