60

I have a Java Application (web-based) that at times shows very high CPU Utilization (almost 90%) for several hours. Linux TOP command shows this. On application restart, the problem goes away.

So to investigate:

I take Thread Dump to find what threads are doing. Several Threads are found in 'RUNNABLE' state, some in few other states. On taking repeated Thread Dumps, i do see some threads that are always present in 'RUNNABLE' state. So, they appear to be the culprit.

But I am unable to tell for sure, which Thread is hogging the CPU or has gone into a infinite loop (thereby causing high CPU util).

Logs don't necessarily help, as the offending code may not be logging anything.

How do I investigate - What part of the application or what-thread is causing High CPU Utilization? - Any other ideas?

Arpit Aggarwal
  • 27,626
  • 16
  • 90
  • 108
Jasper
  • 8,440
  • 31
  • 92
  • 133
  • Did you already try a profiler? – andreapier Apr 04 '13 at 12:43
  • 2
    Your thread dumps should also show where in the code these Runnable threads are during the thread dump. You need to look there in your code. IIRC "runnable" threads may be waiting on I/O and not be taking up CPU, but it's early and I'm still nursing my coffee. – Charles Forsythe Apr 04 '13 at 12:44
  • andreapier> Although i may not be able to use a Profiler in Prod environment, but would a Profiler tell which Thread is hogging the CPU? – Jasper Apr 04 '13 at 13:48
  • Do you see many runnables on a similar line of code? If so is can you past the thread dump for the common lines? – John Vint Apr 04 '13 at 15:15

7 Answers7

57

If a profiler is not applicable in your setup, you may try to identify the thread following steps in this post.

Basically, there are three steps:

  1. run top -H and get PID of the thread with highest CPU.
  2. convert the PID to hex.
  3. look for thread with the matching HEX PID in your thread dump.
ericson
  • 1,658
  • 12
  • 20
22

You may be victim of a garbage collection problem.

When your application requires memory and it's getting low on what it's configured to use the garbage collector will run often which consume a lot of CPU cycles. If it can't collect anything your memory will stay low so it will be run again and again. When you redeploy your application the memory is cleared and the garbage collection won't happen more than required so the CPU utilization stays low until it's full again.

You should check that there is no possible memory leak in your application and that it's well configured for memory (check the -Xmx parameter, see What does Java option -Xmx stand for?)

Also, what are you using as web framework? JSF relies a lot on sessions and consumes a lot of memory, consider being stateless at most!

Community
  • 1
  • 1
Alexandre Jacob
  • 2,993
  • 3
  • 26
  • 36
  • stop the world gc (which is essentially what you are referring to) cannot run multi-threaded. You will find that it will use one of the cores in its entirety. – drone.ah Apr 04 '13 at 13:16
  • Memory Utilization seems to be ok, i am watching that. From Java Visual VM, i can see GC (CPU) Activity is quite low. – Jasper Apr 04 '13 at 13:44
5

In the thread dump you can find the Line Number as below.

for the main thread which is currently running...

"main" #1 prio=5 os_prio=0 tid=0x0000000002120800 nid=0x13f4 runnable [0x0000000001d9f000]
   java.lang.Thread.State: **RUNNABLE**
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:313)
    at com.rana.samples.**HighCPUUtilization.main(HighCPUUtilization.java:17)**
sampathsris
  • 21,564
  • 12
  • 71
  • 98
ranafeb14
  • 437
  • 1
  • 7
  • 12
  • This particular thread is waiting for `writeBytes` OS call to complete. It uses little to no CPU cycles. But the general approach is correct - do a thread dump and look for threads that are performing a computation. – rustyx Jan 15 '20 at 13:11
2

During these peak CPU times, what is the user load like? You say this is a web based application, so the culprits that come to mind is memory utilization issues. If you store a lot of stuff in the session, for instance, and the session count gets high enough, the app server will start thrashing about. This is also a case where the GC might make matters worse depending on the scheme you are using. More information about the app and the server configuration would be helpful in pointing towards more debugging ideas.

WPrecht
  • 1,340
  • 1
  • 17
  • 29
1

Flame graphs can be helpful in identifying the execution paths that are consuming the most CPU time.

In short, the following are the steps to generate flame graphs

yum -y install perf

wget https://github.com/jvm-profiling-tools/async-profiler/releases/download/v1.8.3/async-profiler-1.8.3-linux-x64.tar.gz

tar -xvf async-profiler-1.8.3-linux-x64.tar.gz
chmod -R 777 async-profiler-1.8.3-linux-x64
cd async-profiler-1.8.3-linux-x64

echo 1 > /proc/sys/kernel/perf_event_paranoid
echo 0 > /proc/sys/kernel/kptr_restrict

JAVA_PID=`pgrep java`

./profiler.sh -d 30 $JAVA_PID -f flame-graph.svg

flame-graph.svg can be opened using browsers as well, and in short, the width of the element in stack trace specifies the number of thread dumps that contain the execution flow relatively.

There are few other approaches to generating them

  • By introducing -XX:+PreserveFramePointer as the JVM options as described here
  • Using async-profiler with -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints as described here

But using async-profiler without providing any options though not very accurate, can be leveraged with no changes to the running Java process with low CPU overhead to the process.

Their wiki provides details on how to leverage it. And more about flame graphs can be found here

abrao
  • 11
  • 1
  • 2
0

Your first approach should be to find all references to Thread.sleep and check that:

  1. Sleeping is the right thing to do - you should use some sort of wait mechanism if possible - perhaps careful use of a BlockingQueue would help.

  2. If sleeping is the right thing to do, are you sleeping for the right amount of time - this is often a very difficult question to answer.

The most common mistake in multi-threaded design is to believe that all you need to do when waiting for something to happen is to check for it and sleep for a while in a tight loop. This is rarely an effective solution - you should always try to wait for the occurrence.

The second most common issue is to loop without sleeping. This is even worse and is a little less easy to track down.

OldCurmudgeon
  • 64,482
  • 16
  • 119
  • 213
0

You did not assign the "linux" to the question but you mentioned "Linux top". And thus this might be helpful:

Use the small Linux tool threadcpu to identify the most cpu using threads. It calls jstack to get the thread name. And with "sort -n" in pipe you get the list of threads ordered by cpu usage.

More details can be found here: http://www.tuxad.com/blog/archives/2018/10/01/threadcpu_-_show_cpu_usage_of_threads/index.html

And if you still need more details then create a thread dump or run strace on the thread.

reichhart
  • 813
  • 7
  • 13