0

Given an Ubuntu 14.04 server setup on a popular VPS provider hosting a single website with Nginx, Apache and WordPress / PHP, after starting the server it works fine for a while. Days later the VPS provider shows the server as pegged at 100% CPU utilization, SSH access no longer can connect ( times out ) and the website became inaccessible. Before the server went to 100% CPU utilization the server could be accessed via SSH without issue. There's also little to no load on the server which would cause any CPU use - Typically CPU use ranges under 3%.

How can I diagnose which process is causing the sudden spike in CPU use without being able to shell in? Currently, when this occurs the server is rebooted and then it runs fine for a few more days. Of course, a virus or other malware is suspected although ClamAV didn't find any potential viruses.

sean2078
  • 111
  • 2
  • You could always just guess wildly, that's about as much as you can achieve without actually being able to troubleshoot the issue. – Mark Riddell Jun 23 '16 at 17:23
  • I would start with a simple shell script doing `ps aux` or `ps waxl` every minute appending it to some file. – ott-- Jun 23 '16 at 20:09

1 Answers1

1

The bigger question at play is to setup a method to figure out what is happening to the machine in the first place. During those lockups, are there spikes in memory/network? Can you ask your VPS provider is there is a noisy neighbor on your node? What do you see in your access logs - is it Google scanning you and causing a ton of traffic all at once? If you are running everything on one box, it can be harder to diagnose, but the answers should be there.

When you are locked out, you do have a few generic options:

  1. IPMI interface provided by VPS provider. This is commonly supplied by most small scale providers (aka not Google/AWS). It often depends on the underlying infrastucture, but it is usually an OOB SSH access to the machine shell, or a true iKVM display provided by a Java Applet. Take a look at the Machine Details in your providers dashboard.

  2. Did you have any tools such as NewRelic or FileBeat -> ELK on the machine? Often they will still phone home even in locked out circumstances.

Brennen Smith
  • 1,742
  • 8
  • 11