2

This server runs Ubuntu 12.04.5 LTS and used as a web server (apache 2.2.22) and an ftp server (proftpd 1.3.4a). Nobody can access web pages when this happens for 5 minutes or so. (This is a virtual private server)

This is the output of top command at this time,

top - 09:06:58 up 16 days, 14:29,  1 user,  load average: 36.01, 23.39, 10.79
Tasks: 161 total,  38 running, 123 sleeping,   0 stopped,   0 zombie
Cpu(s): 18.8%us, 56.2%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi, 25.0%si,  0.0%st
Mem:   8171872k total,  8043880k used,   127992k free,   164308k buffers
Swap:  2096124k total,        0k used,  2096124k free,  7007256k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 5348 root      20   0 37536 7092 2104 R   43  0.1   0:43.40 archive.pl
 5354 www-data  20   0  174m 6984 4804 R   42  0.1   0:14.65 cron_job_creati
   35 root      20   0     0    0    0 R   41  0.0  11:40.57 kswapd0
   26 root      20   0     0    0    0 S   39  0.0   1:17.96 sync_supers
 5353 www-data  20   0  174m 6984 4804 R   38  0.1   0:13.32 cron_job_creati
 5352 www-data  20   0  174m 7232 4940 R   31  0.1   0:10.36 cron_job_creati
 5371 root      20   0 25108  308    0 R   24  0.0   0:00.87 master
 5358 www-data  20   0  296m 7476 1060 R   21  0.1   0:03.24 apache2
  948 root      20   0 25108 1604 1296 S   18  0.0   2:50.30 master
 5365 root      20   0  102m 2332  832 R   17  0.0   0:01.10 proftpd
  988 root      20   0  102m 2228  728 R   15  0.0  10:38.37 proftpd
 5317 www-data  20   0  304m  17m 3344 R   15  0.2   0:28.07 apache2
 5369 root      20   0  102m 1916  416 R   15  0.0   0:01.89 proftpd
 5225 www-data  20   0  305m  18m 3408 R   11  0.2   0:31.17 apache2
 5256 www-data  20   0  304m  17m 3344 R    9  0.2   0:29.83 apache2
 5254 www-data  20   0  303m  17m 3336 R    8  0.2   0:27.67 apache2
 5345 www-data  20   0  297m 8156 1720 R    6  0.1   0:04.92 apache2
 5357 root      20   0 17336 1356  972 R    4  0.0   0:04.48 top
 5368 www-data  20   0  296m 6976  624 R    3  0.1   0:00.98 apache2
 5363 root      20   0  102m 2128  628 R    1  0.0   0:01.45 proftpd
    1 root      20   0 24204 1760  912 S    0  0.0   1:31.62 init
    2 root      20   0     0    0    0 S    0  0.0   0:00.00 kthreadd
    3 root      20   0     0    0    0 S    0  0.0   0:53.40 ksoftirqd/0
    5 root      20   0     0    0    0 S    0  0.0   0:00.49 kworker/u:0
    6 root      RT   0     0    0    0 S    0  0.0   0:00.62 migration/0
    7 root      RT   0     0    0    0 S    0  0.0   3:14.39 watchdog/0
    8 root      RT   0     0    0    0 S    0  0.0   2:49.77 migration/1
   10 root      20   0     0    0    0 S    0  0.0   0:12.92 ksoftirqd/1
   12 root      RT   0     0    0    0 S    0  0.0   4:01.38 watchdog/1
   13 root      RT   0     0    0    0 S    0  0.0   0:34.39 migration/2
   15 root      20   0     0    0    0 S    0  0.0   0:12.88 ksoftirqd/2
   16 root      RT   0     0    0    0 S    0  0.0   3:31.75 watchdog/2
   17 root      RT   0     0    0    0 S    0  0.0   1:10.81 migration/3
   19 root      20   0     0    0    0 S    0  0.0   0:08.38 ksoftirqd/3
   20 root      RT   0     0    0    0 S    0  0.0   3:30.11 watchdog/3
   21 root       0 -20     0    0    0 S    0  0.0   0:00.00 cpuset
   22 root       0 -20     0    0    0 S    0  0.0   0:00.00 khelper
   23 root      20   0     0    0    0 S    0  0.0   0:00.00 kdevtmpfs
   24 root       0 -20     0    0    0 S    0  0.0   0:00.00 netns
   25 root      20   0     0    0    0 S    0  0.0   0:00.00 kworker/u:1
   27 root      20   0     0    0    0 S    0  0.0   0:00.08 bdi-default
   28 root       0 -20     0    0    0 S    0  0.0   0:00.00 kintegrityd
   29 root       0 -20     0    0    0 S    0  0.0   0:00.00 kblockd
   30 root       0 -20     0    0    0 S    0  0.0   0:00.00 ata_sff
   31 root      20   0     0    0    0 S    0  0.0   0:00.00 khubd

I noticed kswapd0, sync_supers run only at this time. What are those? Swap memory is not required as there is enough physical memory? archive.pl and other commands starting with cron_ are scripts written by me that run every 10 min. They access the hard disk a lot but never caused the server to run slow. It gets back to normal after 5 minutes or so. Thanks in advance.

Asela
  • 191
  • 1
  • 1
  • 11
  • 1
    Check to see if you have any cron jobs that run at that time: crontab -l ... also, notice that your server load is out the roof in this output (30+ ... what???) Does the server backup any MySQL databases at this time, and/or are there any other tasks that your websites "run" at this time? – David W May 14 '15 at 11:10
  • @DavidW thanks for your reply, yes archive.pl and other commands starting with cron_ are cron jobs written by me that run every 10 min but they don't cause any problem at other times. There are no db backups running at this time. Do you know what 'sync_supers' command is? How could I check if malware or anything running on my server? – Asela May 14 '15 at 11:38
  • 1
    Searching a bit seems there are some kernels have troubles with kwsapd, is it possible for you to upgrade your kernel? – alphamikevictor May 14 '15 at 11:50
  • @alphamikevictor thanks for your reply, my kernel version is 3.2.0-80-generic. I learned that latest version is 4.0.3 (https://www.kernel.org/finger_banner) This is a production server, so I am a bit hesitant to update kernel. In case updating kernel cause anything to stop running? (I have never done a kernel update before :) ) – Asela May 14 '15 at 12:20
  • 1
    Perhaps you can upgrade the kernel, but instead compiling it yourself (which is easy leading you to problems if you are not used to) just upgrade trhough ubuntu updates (if available). – alphamikevictor May 14 '15 at 12:25

1 Answers1

1

One thing is for certain, your load average is out of control - load average: 36.01, 23.39, 10.79. By looking at the kernel threads, it appears the VM has 4 vCPUs and after normalizing for CPU, the load average is 9,5.84, 2.70 and this is really high. Any load average above 1.0 should be investigated for either an I/O, memory or CPU overload.

In your case, the Disk I/O does not appear to be an issue, 0.0%wa shows the CPU is not wasting anytime waiting for I/O and there do not appear to be any process in D state.

Memory - I don't see any swapping and the actual memory usage under RES looks good.

CPU - you have 38 running tasks and in your top output you can see lots of processes in R state, all contending for CPU cycle.

I would recommending going through this Linux Journal link to troubleshoot the high load average. You can start by moving the cron jobs out of the 9:00 AM window, for instance.

Daniel t.
  • 9,291
  • 1
  • 33
  • 36
  • Thanks very much for your reply, helped a lot to narrow down possible reasons. As you have said and the link you provided points out it must be CPU load issue. I stopped my crons, but still no luck. Can you please tell me how can I check whether operating system updates or any other system tasks run at this time (9.00AM)? 56.2%sy Thanks again for your reply – Asela May 27 '15 at 11:07
  • My /etc/crontab reads as `# m h dom mon dow user command 17 * * * * root cd / && run-parts --report /etc/cron.hourly 25 6 * * * root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ) 47 6 * * 7 root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.weekly ) 52 6 1 * * root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.monthly ) #` so nothing I could find starts at 9.00AM – Asela May 27 '15 at 11:17