Sudden increase in CPU usage and server response time

Question

Over the past 10 days, I'm suddenly experiencing increased CPU usage on my website with CPU usage reaching 100% three times. During that time I'm unable to SSH to my website so I haven't been able to figure out what is actually using that much CPU but I have a feeling it's got to do something with Apache. My website is a Django application using Apache2, PostgreSQL and Memcached hosted on DigitalOcean(512MB Ram, 20GB SSD Disk, Ubuntu 14.04 x64).

Here is the output of top currently.

top - 16:15:31 up 19:12,  1 user,  load average: 0.01, 0.09, 0.46
Tasks:  78 total,   2 running,  76 sleeping,   0 stopped,   0 zombie
%Cpu(s):  9.5 us,  1.9 sy,  0.0 ni, 85.0 id,  3.3 wa,  0.3 hi,  0.0 si,  0.0 st
KiB Mem:    501868 total,   495576 used,     6292 free,    41836 buffers
KiB Swap:        0 total,        0 used,        0 free.   152976 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
15036 www-data  20   0  906084 156828   2448 S 19.3 31.2  12:08.08 apache2
    1 root      20   0   33472   1300      0 S  0.0  0.3   0:02.09 init
    2 root      20   0       0      0      0 S  0.0  0.0   0:00.02 kthreadd
    3 root      20   0       0      0      0 S  0.0  0.0   0:01.63 ksoftirqd/0
    5 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 kworker/0:0H
    7 root      20   0       0      0      0 S  0.0  0.0   0:10.60 rcu_sched
    8 root      20   0       0      0      0 R  0.0  0.0   0:29.05 rcuos/0
    9 root      20   0       0      0      0 S  0.0  0.0   0:00.00 rcu_bh
   10 root      20   0       0      0      0 S  0.0  0.0   0:00.00 rcuob/0
   11 root      rt   0       0      0      0 S  0.0  0.0   0:00.00 migration/0
   12 root      rt   0       0      0      0 S  0.0  0.0   0:01.07 watchdog/0
   13 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 khelper
   14 root      20   0       0      0      0 S  0.0  0.0   0:00.00 kdevtmpfs
   15 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 netns
   16 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 writeback
   17 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 kintegrityd
   18 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 bioset
   19 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 kworker/u3:0

Here is the Apache conf file

<VirtualHost *:80>

    RewriteEngine On
    RewriteCond %{HTTP_HOST}  ^example.com [nocase]
    RewriteRule ^(.*)         http://www.example.com$1 [last,redirect=301]

    ServerName example.com
    ServerAlias www.example.com
    ServerAdmin admin@example.com

    WSGIDaemonProcess example python-path=/home/abc/example:/home/abc/example/env/lib/python2.7/site-packages
    WSGIProcessGroup example
    WSGIApplicationGroup %{GLOBAL}
    WSGIScriptAlias / /home/abc/example/wsgi.py

    DocumentRoot /home/abc/example

    <Directory />
        Require all granted
    </Directory>

    Alias /static/ /home/abc/example/static/

    <Directory /home/abc/example/static>
        Order deny,allow
        Allow from all
    </Directory>

    Alias /media/ /home/abc/example/media/

    <Directory /home/abc/example/media>
        Order deny,allow
        Allow from all
    </Directory>

    ErrorLog ${APACHE_LOG_DIR}/error.log
    CustomLog ${APACHE_LOG_DIR}/access.log combined

</VirtualHost>

Here is the graph of CPU, Disk and Bandwidth usage over the past month

Monthly CPU Usage Monthly Disk Usage Monthly Bandwidth Usage

As you can see the CPU usage spiked 3 times over the past couple of days. I had to reboot my droplet to bring it down. The CPU usage generally used to vary between 0-5% up until a few days ago but has increased now.

The response time for the website has also increased to about 4 seconds. Everything seemed to be working fine up until around 10 days and I haven't made any changes to my configuration since then. There hasn't been an increase in the traffic with daily traffic being around 1500 visitors. How can I identify what's causing this issue? Any ideas?

You need to look into this under this heavy load. Your top snapshot clearly states it was taken during normal working cycle, not while running with 0% CPU idle. — drookie, Nov 14 '14 at 11:45
@EugeneM.Zheganin I know that. But I'm not able to SSH when the CPU load reached 100%, so there isn't a way for me to run top at that time. Also the response time of the website is increased even when CPU usage is normal. Why could that be? — Yin Yang, Nov 14 '14 at 12:03
This could be caused by a billion reasons, it's very counterproductive to guess. You can open an SSH session to this host and wait for the outage to happen (and see wha't happening there, but probably your session will be frozen too). Or you can set up a monitoring software, like zabbix or cacti, and seek the answer in histograms, - like "was my CPU increasing suddenly, or for some time", "what was the traffic on an interface at this time", "what was the number of connections" and so on. — drookie, Nov 14 '14 at 12:15
@EugeneM.Zheganin I know it could be a billion reasons and that's why I can't figure out how to go about this. I've added the graph CPU usage for the past month. It happens suddenly and is not linked to the amount of traffic on the website. — Yin Yang, Nov 14 '14 at 12:33
Have you got any other metrics for memory, swap, I/O, disk etc during these events ? If not install monitoring and gather information. Be aware though that with just 512MB of memory your probably overloading the droplet. — user9517, Nov 14 '14 at 12:42
@Iain I've added the graphs for Disk and Bandwidth usage for the same time period. What would be the ideal memory for a website with an average traffic of 1500 visitors per day? — Yin Yang, Nov 14 '14 at 12:54
I would try throwing atop and having it grab historical data so you can see what's happening when you're unable to log in. Also is OOM-killer getting activated? You can check this by doing the following on most Linux Distros: grep -i kill /var/log/messages EDIT: Be wary, Atop can generate ALOT of logs and take up ALOT of space — Ryan Smith, Jun 03 '19 at 20:06

score 0 · Answer 1 · answered Dec 31 '14 at 14:08

from the Bandwidth graphs, it shows increase in the inbound traffic, i.e. traffic going to the server, also increase in the disk write rate .. this makes me suspect some kind of attack on the server .. Best security approach here is to change ALL server passwords .. This is essential as to identify if it is a bug in a software or some sort of successful breach to your server ...

score 0 · Answer 2 · answered Dec 31 '14 at 14:38

0

If sikas' theory is correct that this is caused by external traffic then you need to tune apache better. Cut the ServerLimit by half or more and see if it survives better next time.

answered Dec 31 '14 at 14:38

chicks

3,793
10
27
36

score 0 · Answer 3 · answered May 25 '15 at 13:42

0

You should definitely try using Cloudflare, at least try the free version, your server will be more free without the need to serve static resources and your site will be protected for most common attacks.

answered May 25 '15 at 13:42

Beto Aveiga

159
1
9

Sudden increase in CPU usage and server response time

3 Answers3