We have an API written in scala 2.12.x and Play framework 2.5.x. The API connects to AWS aurora cluster using MariaDb connector/J 2.5.4 as such jdbc:mysql:aurora://some-aurora-cluster
Functionally everything works fine except we've noticed high CPU usage even when there is no traffic. A bit of research shows the following:
[ec2-user@ip-xxx-xxx-xxx-xxx ~]$ top -H
…
6373 root 20 0 4644452 990888 21540 S 14.6 12.6 1:15.68 MariaDb-failove
6374 root 20 0 4644452 990888 21540 S 13.6 12.6 1:16.11 MariaDb-failove
6305 root 20 0 4644452 990888 21540 S 13.3 12.6 1:14.31 MariaDb-failove
6375 root 20 0 4644452 990888 21540 S 12.3 12.6 1:14.59 MariaDb-failove
6372 root 20 0 4644452 990888 21540 S 11.3 12.6 1:15.78 MariaDb-failove
…
The above cmd shows a number of MariaDb-failover. I am not sure what it does and why there are multiple of it busy with high CPU usage.
[ec2-user@ip-xxx-xxx-xxx-xxx ~]$ netstat -a | less
…
tcp6 0 0 ip-xxx-xxx-xxx-31.:37446 ip-xxx-xxx-xxx-129:mysql TIME_WAIT
tcp6 0 0 ip-xxx-xxx-xxx-31.:37108 ip-xxx-xxx-xxx-129:mysql TIME_WAIT
tcp6 0 0 ip-xxx-xxx-xxx-31.:37648 ip-xxx-xxx-xxx-129:mysql TIME_WAIT
tcp6 0 0 ip-xxx-xxx-xxx-31.:36934 ip-xxx-xxx-xxx-129:mysql TIME_WAIT
tcp6 0 0 ip-xxx-xxx-xxx-31.:36870 ip-xxx-xxx-xxx-129:mysql TIME_WAIT
tcp6 0 0 ip-xxx-xxx-xxx-31.:37254 ip-xxx-xxx-xxx-129:mysql TIME_WAIT
tcp6 0 0 ip-xxx-xxx-xxx-31.:37902 ip-xxx-xxx-xxx-129:mysql TIME_WAIT
…
There are a lot of TIME_WAIT. Again this is odd since there was no traffic at the time I executed this cmd.
[ec2-user@ip-xxx-xxx-xxx-xxx ~]$ netstat -nat | awk '{print $6}' | sort | uniq -c | sort -n
1 established)
1 Foreign
1 SYN_SENT
6 LISTEN
37 ESTABLISHED
851 TIME_WAIT
There are hundred of TIME_WAIT; and the number is changing every time I executed the cmd.
Does anyone have any insights as to whether this is normal or something I need to worry about?
Please let me know if you have further questions.
========== More info
ps -aux | grep java
Got the PID: 2655
jstack 2655 > threaddump.log
Here is the content (pruned):
2020-06-16 16:44:39
Full thread dump OpenJDK 64-Bit Server VM (25.252-b09 mixed mode):
"MariaDb-failover-5" #276 daemon prio=5 os_prio=0 tid=0x00007f4b0400b000 nid=0x18e7 waiting on condition [0x00007f4b0c6d6000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.mariadb.jdbc.internal.protocol.AuroraProtocol.loop(AuroraProtocol.java:269)
at org.mariadb.jdbc.internal.failover.impl.AuroraListener.reconnectFailedConnection(AuroraListener.java:203)
at org.mariadb.jdbc.internal.failover.thread.FailoverLoop.doRun(FailoverLoop.java:84)
at org.mariadb.jdbc.internal.failover.thread.TerminableRunnable.run(TerminableRunnable.java:80)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"MariaDb-failover-4" #275 daemon prio=5 os_prio=0 tid=0x00007f4b0400a000 nid=0x18e6 waiting on condition [0x00007f4afbefd000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.mariadb.jdbc.internal.protocol.AuroraProtocol.loop(AuroraProtocol.java:269)
at org.mariadb.jdbc.internal.failover.impl.AuroraListener.reconnectFailedConnection(AuroraListener.java:203)
at org.mariadb.jdbc.internal.failover.thread.FailoverLoop.doRun(FailoverLoop.java:84)
at org.mariadb.jdbc.internal.failover.thread.TerminableRunnable.run(TerminableRunnable.java:80)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"MariaDb-failover-3" #274 daemon prio=5 os_prio=0 tid=0x00007f4b04009000 nid=0x18e5 waiting on condition [0x00007f4afbbfa000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.mariadb.jdbc.internal.protocol.AuroraProtocol.loop(AuroraProtocol.java:269)
at org.mariadb.jdbc.internal.failover.impl.AuroraListener.reconnectFailedConnection(AuroraListener.java:203)
at org.mariadb.jdbc.internal.failover.thread.FailoverLoop.doRun(FailoverLoop.java:84)
at org.mariadb.jdbc.internal.failover.thread.TerminableRunnable.run(TerminableRunnable.java:80)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"MariaDb-failover-2" #273 daemon prio=5 os_prio=0 tid=0x00007f4b04008000 nid=0x18e4 waiting on condition [0x00007f4b0c5d5000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.mariadb.jdbc.internal.protocol.AuroraProtocol.loop(AuroraProtocol.java:269)
at org.mariadb.jdbc.internal.failover.impl.AuroraListener.reconnectFailedConnection(AuroraListener.java:203)
at org.mariadb.jdbc.internal.failover.thread.FailoverLoop.doRun(FailoverLoop.java:84)
at org.mariadb.jdbc.internal.failover.thread.TerminableRunnable.run(TerminableRunnable.java:80)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"MariaDb-failover-1" #265 daemon prio=5 os_prio=0 tid=0x00007f4b2407e800 nid=0x18a1 waiting on condition [0x00007f4b0c2d4000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.mariadb.jdbc.internal.protocol.AuroraProtocol.loop(AuroraProtocol.java:269)
at org.mariadb.jdbc.internal.failover.impl.AuroraListener.reconnectFailedConnection(AuroraListener.java:203)
at org.mariadb.jdbc.internal.failover.thread.FailoverLoop.doRun(FailoverLoop.java:84)
at org.mariadb.jdbc.internal.failover.thread.TerminableRunnable.run(TerminableRunnable.java:80)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"VM Thread" os_prio=0 tid=0x00007f4b540db800 nid=0xf37 runnable
"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007f4b54069000 nid=0xf35 runnable
"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007f4b5406a800 nid=0xf36 runnable
"VM Periodic Task Thread" os_prio=0 tid=0x00007f4b54132800 nid=0xf3e waiting on condition
JNI global references: 2520
PIDs of LWPs in Hex:
6373 18e5
6374 18e6
6305 18a1
6375 18e7
6372 18e4
========== More info
Our API has 5 different db configurations - 5 different databases. Each has a connection string as such jdbc:mysql:aurora://some-aurora-cluster
Notice that aurora
mode is used for better experience of fail-over so that it explains the 5 light-weight processes. But they are chatty, resulting in many TIME_WAIT
and possibly the cause of elevated CPU usage.
Has anyone run into this situation before and how did you mitigate it? I still would like to use aurora
mode (or something equivalent) so that we don't have to restart the app for when the db fails over.