2

I am facing strange issue on an Oracle server. The server is on high CPU load everyday for around 5hrs.

This starts everyday at around 03:46 AM and continues for next 5 hrs. CPU load reduces immediately when I log-in (ssh) or after 5 hrs, which ever is earlier (shown as below)

          IST   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15

07:27:01  IST        12       700     12.42     12.28     12.22
07:28:01  IST        13       699     12.41     12.31     12.23
07:29:01  IST        13       701     12.31     12.31     12.24
07:30:01  IST        14       708     12.53     12.34     12.25
07:31:02  IST        16       707     13.90     12.78     12.40
07:32:01  IST        14       708     13.46     12.86     12.46
07:33:01  IST        12       704     13.39     12.94     12.51
07:34:01  IST         0       684      9.41     12.08     12.25
07:35:01  IST         2       685      4.44     10.16     11.58
07:36:01  IST         2       685      2.26      8.49     10.91
07:37:01  IST         2       687      1.06      7.02     10.25
07:38:01  IST         0       687      0.76      5.84      9.64
07:39:01  IST         0       682      0.72      4.89      9.08
07:40:01  IST         0       682      0.45      4.06      8.53
07:41:01  IST         1       684      0.46      3.41      8.02

There is no specific process that's running at this time. What elese can be checked to diagnose it? I am running:

Linux 2.6.32-573.7.1.el6.x86_64 #1 SMP Tue Sep 22 22:00:00 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Upgraded kernel recently and see a bit of reduction in cpu load (18 to 12) but still the problem remains the same. When i ssh, the load decreases. I am finding answer for it.

HBruijn
  • 77,029
  • 24
  • 135
  • 201
  • So, I blocked few IPs identified as bad logins but still the problem remains same. Kernel was patched as I was seeing [migration] -t12 in top during load time but it did not solve the problem. – Somesh Pant Oct 01 '15 at 07:19
  • Load Average is not the same as CPU usage. See [Wikipedia load article](https://en.wikipedia.org/wiki/Load_(computing)) There are several contributing factors to the load average. One of them is IO Wait. If your IO rate decreases, you might see a concurrent increase in load average. Figure out which key system metric is actually causing the high load average. Also, is your system virtual or physical? Are there any known reasons a hypervisor might deprioritize a VM when there are no active terminal sessions? – Larry Silverman Oct 06 '15 at 20:09
  • The server in question is a physical server. I have check I/O wait as well, don't see anything that is contributing to this load. Yes, its load average not CPU load. Thanks for correcting me. This load average decreases rapidly when I ssh, along with runq-sz. Any more suggestions? – Somesh Pant Oct 07 '15 at 06:56
  • Have you checked your crons and cron.daily for jobs? Did you personally build/configure this server or was it handed to you? Can you set up a cron job to dump output from top into a file so you can see what it records when you're not shelled in? – Larry Silverman Oct 07 '15 at 14:34
  • Yes nothing in crons. This server was handed over to me. It was dump output that showed migration -t12 in top. Other things look normal. – Somesh Pant Oct 08 '15 at 03:17
  • Not sure what else to recommend except to open an SR with Oracle. Similar topic discussed in this link in 2014. https://community.oracle.com/thread/3581966?start=0&tstart=0 – Larry Silverman Oct 14 '15 at 02:11

0 Answers0