3

I got this CentOS server in which a Java WebApp (Tomcat6+Hibernate+MySQL+Struts2) is being run.

Usually cpu usage is about 10% but sometimes all of a sudden it goes to 100% and the application freezes. The process causing this condition is the java command, then the server have to be rebooted to get things to normal. This happens completely irregularly, so it is kinda unlikely to be an app bug.

this is the top command under normal condition:

top - 12:50:35 up 21 min,  1 user,  load average: 0.13, 0.18, 0.21
Mem:   8300688k total,   836232k used,  7464456k free,    22168k buffers
Swap: 16779884k total,        0k used, 16779884k free,   309080k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  SWAP   TIME CODE DATA nFLT COMMAND
 3292 tomcat    18   0 1382m 415m  10m S 11.0  5.1   2:55.45 967m   2:55   36 1.3g  537 java
 3165 mysql     15   0  137m  25m 4908 S  5.3  0.3   0:26.64 111m   0:26 6496 124m   82 mysqld
 3456 root      34  19 25660   9m 2076 S  0.0  0.1   0:00.01  15m   0:00    4 8060    2 yum-updatesd
 3345 root      18   0 23040 9420 5520 S  0.0  0.1   0:00.08  13m   0:00  300 3860   20 httpd
 3421 apache    18   0 23040 4808  880 S  0.0  0.1   0:00.00  17m   0:00  300 3860    0 httpd
 3422 apache    18   0 23040 4808  880 S  0.0  0.1   0:00.00  17m   0:00  300 3860    0 httpd
 3423 apache    18   0 23040 4808  880 S  0.0  0.1   0:00.00  17m   0:00  300 3860    0 httpd
 3424 apache    18   0 23040 4808  880 S  0.0  0.1   0:00.00  17m   0:00  300 3860    0 httpd
 3425 apache    23   0 23040 4808  880 S  0.0  0.1   0:00.00  17m   0:00  300 3860    0 httpd
 3426 apache    24   0 23040 4808  880 S  0.0  0.1   0:00.00  17m   0:00  300 3860    0 httpd
 3427 apache    23   0 23040 4808  880 S  0.0  0.1   0:00.00  17m   0:00  300 3860    0 httpd
 3428 apache    23   0 23040 4808  880 S  0.0  0.1   0:00.00  17m   0:00  300 3860    0 httpd
 2951 haldaemo  19   0  5744 3944 1692 S  0.0  0.0   0:00.52 1800   0:00  268 2236    0 hald
 2669 named     19   0  109m 3684 1928 S  0.0  0.0   0:00.08 105m   0:00  364 102m    3 named

and when the hazard comes up:

top - 12:25:10 up 59 min,  3 users,  load average: 1.09, 0.97, 0.64
Tasks: 192 total,   1 running, 189 sleeping,   2 stopped,   0 zombie
Cpu(s): 12.5%us,  0.0%sy,  0.0%ni, 87.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   8300688k total,  2303376k used,  5997312k free,    85104k buffers
Swap: 16779884k total,        0k used, 16779884k free,   882748k cached

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  SWAP   TIME CODE DATA nFLT COMMAND
 6609 root      18   0 1356m 1.2g  10m S 101.9 14.8   4:50.37 154m   4:50   36 1.3g    1 java
    1 root      15   0  2068  628  536 S  0.0  0.0   0:01.25 1440   0:01   32  280   20 init
    2 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00    0   0:00    0    0    0 migration/0
    3 root      34  19     0    0    0 S  0.0  0.0   0:00.00    0   0:00    0    0    0 ksoftirqd/0
    4 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00    0   0:00    0    0    0 watchdog/0
    5 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00    0   0:00    0    0    0 migration/1
    6 root      34  19     0    0    0 S  0.0  0.0   0:00.00    0   0:00    0    0    0 ksoftirqd/1
    7 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00    0   0:00    0    0    0 watchdog/1
    8 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00    0   0:00    0    0    0 migration/2
    9 root      34  19     0    0    0 S  0.0  0.0   0:00.00    0   0:00    0    0    0 ksoftirqd/2
   10 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00    0   0:00    0    0    0 watchdog/2
   11 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00    0   0:00    0    0    0 migration/3
   12 root      34  19     0    0    0 S  0.0  0.0   0:00.00    0   0:00    0    0    0 ksoftirqd/3
   13 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00    0   0:00    0    0    0 watchdog/3

Interesting thing that the java process' user is tomcat when everything is fine, but it turns into root when problem comes up.

what could cause the issue?

SJ.Jafari
  • 204
  • 1
  • 4
  • 8
  • 1
    That's an easy one: Your Java application is the culprit. Ask the developer what can cause this. – mailq Aug 16 '11 at 08:30
  • Or look into the log files. Chances are you will find an entry that gives you a hint (or us, if you add it to your question). – Sven Aug 16 '11 at 08:42
  • Actually, I myself happen to be the developer here. There wasn't such an issue since the application was launched (1 year ago) and we've made no significant change since this has started to occur. is it possible to be a hosting problem? cuz' the host we're using is facing some difficulties with data centers and stuff these days but I can't connect these two facts to each other. – SJ.Jafari Aug 16 '11 at 09:39

2 Answers2

5

There is obviously a thread that is hanging.

kill -3 processid

Will show a list of the running threads in the java-app. Collect these and send it back to the dev.

Bart De Vos
  • 17,911
  • 6
  • 63
  • 82
  • Do more than one, with one/two seconds of gap in between, so that execution progress can be seen. One thread dump might be enough if the process is locked waiting for an event (and thus 0% CPU usage) but you may need to get a bunch of them if process is running. – Ochoto Aug 16 '11 at 09:52
  • This doesn't seem to do anything. What am I doing wrong? No console output and nothing in catalina.out. Where should i look for the result? – Norbert Bicsi Jan 29 '18 at 09:38
3

I would run VisualVM (if your running the Oracle version of Java) , attach to the process, then take a dump of the vm (make sure its the most recent version of VisualVM/JDK that is possible). Also, there is a "detect theadlock" button in their somewhere (or maybe it was in jconsole.exe). Then , you can use a tool like jhat.exe (or Eclipse) to view the dump.

djangofan
  • 4,182
  • 10
  • 46
  • 59