3

I have a web service which receives requests and after performing certain validations dumps data to a Message Queue. This service creates a ThreadPoolExecutor internally. I use Prometheus and Grafana to visualize application metrics and observed that CPU usage is always increasing and never reducing even during low workload (even if I detach the application from Load Balancer and serve no traffic).

Earlier I was running application using Jetty 9.0.6 and am observing this behavior after switching to Jetty 9.4.22. Reverting back to Jetty 9.0.6 (or even 9.4.13) seems to resolve this issue. So I suspect this might be related to Jetty (9.4.13+).

So I tried to analyze top command output and connect to application using VisualVM at the same time.

Linux top command reported that a certain thread from the ThreadPoolExecutor was using high CPU with CPU time over 500 minutes, however Visual VM showed that the thread is in waiting state.

top command output (Few threads report high CPU usage for prolonged duration like following when they're actually in WAITING state)

10641 app        20   0 30.4g  21g 112m S 47.9 70.2 657:12.56 657:12 pool-5-thread-1
10914 app        20   0 30.4g  21g 112m R 33.7 70.2 508:25.46 508:25 pool-5-thread-1
10009 app        20   0 30.4g  21g 112m R 33.7 70.2 454:39.50 454:39 pool-5-thread-5
10220 app        20   0 30.4g  21g 112m S 33.7 70.2 444:55.14 444:55 pool-5-thread-7
12230 app        20   0 30.4g  21g 112m R 32.8 70.2 438:18.02 438:18 pool-5-thread-2

The thread state from Thread Dump

pool-5-thread-1
priority:5 - threadId:0x00007f13b0033000 - nativeId:0x8c9 - nativeId (decimal):2249 - state:WAITING
stackTrace:
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000037ca83da0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

pool-5-thread-5
priority:5 - threadId:0x00007f13a0020800 - nativeId:0x8cd - nativeId (decimal):2253 - state:WAITING
stackTrace:
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000037ca83da0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

pool-5-thread-7
priority:5 - threadId:0x00007f139c021800 - nativeId:0x8cf - nativeId (decimal):2255 - state:WAITING
stackTrace:
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000037ca83da0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

pool-5-thread-2
priority:5 - threadId:0x00007f13bc021800 - nativeId:0x8ca - nativeId (decimal):2250 - state:WAITING
stackTrace:
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000037ca83da0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

I need help understanding if this could be related to switching from Oracle JDK 8u65 to Amazon Corretto JDK 8u232

CPU usage as monitored in EC2 console CPU usage as monitored in EC2 console

Update: Full top output (from another occurrence) -

top - 12:16:34 up  5:03,  0 users,  load average: 10.64, 9.74, 9.72
Threads: 7123 total,  10 running, 7113 sleeping,   0 stopped,   0 zombie
%Cpu(s): 28.3 us, 16.0 sy,  0.0 ni, 54.8 id,  0.0 wa,  0.0 hi,  0.9 si,  0.0 st
KiB Mem:  32121548 total, 26088752 used,  6032796 free,   134028 buffers
KiB Swap:        0 total,        0 used,        0 free.  2008744 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
 5660 app        20   0 30.582g 0.021t  97556 S 47.0 69.6 128:42.34 pool-5-thread-3
 4804 app        20   0 30.582g 0.021t  97556 R 49.6 69.6  98:41.07 pool-5-thread-1
 2695 app        20   0 30.582g 0.021t  97556 R 49.9 69.6  93:10.97 pool-5-thread-5
 2392 app        20   0 30.582g 0.021t  97556 S 48.0 69.6  79:47.46 pool-5-thread-9
 2284 app        20   0 30.582g 0.021t  97556 S  3.2 69.6  13:04.52 pool-5-thread-2

JVM flags being used -

-XX:+UseG1GC
-XX:InitiatingHeapOccupancyPercent=70
-Xss228k
-XX:G1HeapRegionSize=4
-XX:MaxGCPauseMillis=200
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:NativeMemoryTracking=detail
-XX:+PrintTenuringDistribution
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=20
-XX:GCLogFileSize=512M
-Xloggc:/home/app/logs/gc/${CURRENT_DATETIME}.log
-Xms100m
-Xmx${JAVA_XMX}m
-XX:ActiveProcessorCount=${CPUS}
-XX:MaxMetaspaceSize=256m
-XX:CompressedClassSpaceSize=200m
-Dcom.sun.management.jmxremote=true
-Dcom.sun.management.jmxremote.port=6001
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.rmi.port=6001
-Djava.rmi.server.hostname=${HOST}
-XX:+UsePerfData
Dhruvil Vaghela
  • 580
  • 6
  • 13

1 Answers1

0

It is possible that you have hit JDK-8129861 that has been fixed in Java 9. It may also be related to JDK-8022642.

The workaround is to set core pool size at least 1:

ScheduledThreadPoolExecutor executor = new ScheduledThreadPoolExecutor(1);

References: https://bugs.openjdk.java.net/browse/JDK-8129861 https://bugs.openjdk.java.net/browse/JDK-8022642

ScheduledThreadPoolExecutor with corePoolSize = 0 causes 100% load on one CPU core

Md.
  • 496
  • 4
  • 11