5

I've recently had a bit of a strange problem with my web server. Over the last day or so the site seems to be slowing down somewhat at random intervals, we don't seem to be experiencing any major extra traffic, however a quick look at 'top' and httpd seems to be jumping from between 3-10% to around 99%, then briefly hitting around mid 80's then going back down. For example:

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 
2443 apache    25   0  256m  20m 5472 R 88.2  2.1   3:22.29 httpd

This seems to happen every 30 minutes or so. The strange thing is at the same time this is happening I can run the Apache server-status page and will get (for example):

CPU Usage: u700.5 s6.22 cu0 cs0 - 20.2% CPU load

So my question is two-fold:

  1. Does anyone know why this issue may have cropped up over the last day or so (there have been no changes made to the server)
  2. Why would my CPU usage stats in top be vastly higher than server-status and which is correct?
Caleb
  • 11,813
  • 4
  • 36
  • 49
stukerr
  • 307
  • 4
  • 13

4 Answers4

2

Check your access logs. You may have a dataminer or crawler hitting every page on your site, since the interval is so regular.

Hyppy
  • 15,608
  • 1
  • 38
  • 59
  • Hi, Thanks for the response, i've been through the logs and really can't find anything out of the ordinary, would this also show in the server-status page ? – stukerr May 10 '11 at 16:42
2

The CPU usage from Apache's server-status page is the average usage since Apache was started so it won't show spikes like this. When you get these load spikes you can check the server-status page to see what pages/clients are being server (ExtendedStatus must be on).

You can also use netstat to see what clients are currently accessing your machine:

 netstat -an | grep ESTABLISHED

If you run this over multiple hours and traffic spikes you may be able to spot a reoccuring IP address and potentially trace to a specific robot/crawler. If this does turn out to be the case you can look into using robots.txt to limit how well behaving robots should crawl your site.

Edit: On a busy server the above netstat command should show some entries like:

tcp        0      0 10.2.212.13:80              216.146.52.21:24979         ESTABLISHED
tcp        0      0 10.2.212.13:80              86.174.113.138:54901        ESTABLISHED
tcp        0      0 10.2.212.13:80              94.1.216.253:51204          ESTABLISHED
tcp        0      0 10.2.212.13:80              24.9.61.204:62936           ESTABLISHED

The client's IP address will be the one on the right. If you only see 1 or 2 lines it just means that at that moment there is just your ssh connection. Check again when your load increases. You can also remove the grep to list all connections although this will include a large number of old TIME_WAIT.

I would start with the extended server-status and see if that can reveal any obvious crawlers during traffic peaks.

uesp
  • 3,414
  • 1
  • 18
  • 16
  • Thanks for explaining why the server-status is looking different. I'm afraid i'm a bit of a novice at Linux command line, when I run netstat -an | grep ESTABLISHED it just gives me the IP address of the server and the SSH port i'm on then the IP address of my ADSL connection. How would I run this to get the ip addresses of any robots/cralwers over several hous as you suggested ? Is there any way to automate that ? – stukerr May 10 '11 at 16:52
  • Thanks for the help, it turned out that my access log wasn't logging images to save space and someone had hotlinked a large limit from some high-usage filesharing site, so I managed to stop that in .htaccess – stukerr May 11 '11 at 13:14
2

Create an simple executable file:

#!/usr/bin/sh
# use IP
 netstat -na |grep ESTABLISHED
# use NAMES
# netstat -ta |grep ESTABLISHED

The "ta" will print out DNS names so uncomment out which one you prefer.

Then put into a program that runs files on an interval, such as a crontab. I would read the man page for this, you may not be able to even use it. You will want to send the output to a log for future use. You can add a date command to the script if you prefer to note down the time ran. Example of crontab:

#minute hour dayofmonth monthofyear dayofweek
0,15,30,45 * * * * <path/to/script> > <log>

This is edited with crontab -e (again read the man page).

You can use this to sort the top entries in your access.log:

awk '{print $1}' access_log | sort |uniq -c |sort -n

If you are really seeing a slow response with the webpage, look into the I/O wait, sometimes the CPU use being "high" isn't a big deal.

Schrute
  • 807
  • 6
  • 14
1

I suspect that you are running on a quad core CPU. In that case you could easily end up in a situation where top will return the % as a load per core, whereas other tools would divide this figure by the number of cores to arrive at an overall load figure for the CPU in total.

As far as the variations are concerned, I am inclined to give the same advice as Hyppy.

wolfgangsz
  • 8,847
  • 3
  • 30
  • 34
  • Hi thanks for the response, according to the OS the server has a AMD Athlon(tm) 64 Processor 3500+, which I believe is just a single core processor. – stukerr May 10 '11 at 16:46