1

I recently made the move to The Cloud from a bare metal server for personal use and I couldn't be happier. Except for one thing: My server crashes about once a week.

Rackspace has been really great and extremely helpful and I wish to stay with them and recommend them to others. But with this issue I have been told that the server is just running out of memory and I need to look into it. While I'm fine with that and this does seem like my issue, I've never heard of a Linux machine crashing from running out of memory. In my experience it would, worst case scenario, run really slowly or the kernel would start killing processes.

Some Details:

  • Running Gentoo Linux (up to date)
  • 512MB RAM, 1G Swap
  • Services installed & running:
    • 2 Apache 2 procs (1 minimal, for serving static/cached resources and proxying; 2nd has cgi, mod_perl and mod_jk)
    • 2 Tomcat Instances (1 has 2 apps I made, the other is just for Nexus)
    • Mysql
  • When the crash happens:
    • server can be pinged
    • ssh connection hangs indefinitely
    • console will allow username but password prompt never comes up. after 60 seconds it times out and I'm presented with another login prompt. repeat.
    • services (http, tomcat, mysql) cannot be contacted; The connection will hang and not report 404, server not found, etc.
    • Softboot will not work, hard boot needs to be performed often more than once (even with a 30 min wait between reboot attempts)

I've run this same setup on servers throughout the years but with only 256MB RAM. The only difference here is I'm on a virtual machine.

My question could take one of two forms: Has anyone had a similar problem with Rackspace cloud or other cloud hosts? (and if so, what was the solution?) or; What's a good way to track down my issue? I setup a cron that outputs uptime and free to a file every minute so I can examine it after the next crash but that seems hacky.

Full Disclosure: I'm a software developer by trade so that's where most of my experience is, but I have about 15 years experience using Linux for desktop and servers both for personal and professional use.

Josh Johnson
  • 133
  • 5
  • Really slow can be indistinguishable from crashed for a remote user. If it can be pinged it's not truly crashed. Rackspace's explanation sounds reasonable to me. – ceejayoz May 18 '12 at 14:32
  • @ceejayoz Agreed. Users aside, if I could at least get in while it was "really slow" I could diagnose the issue. In this case I've waited hours to get into the server in hopes to catch a runaway process but it eventually needed to be rebooted. – Josh Johnson May 18 '12 at 14:36
  • 1
    You should get as much status information from your server as possible. Definitely monitoring its load at few minutes interval? Then see if any strange patterns show before the server stops responding. – Dima Chubarov May 18 '12 at 14:44
  • A tool like [ganglia](http://ganglia.sf.net) could be helpful. – Dima Chubarov May 18 '12 at 14:50
  • @DmitriChubarov Much appreciated, I've installed ganglia and look forward to seeing the data. – Josh Johnson May 18 '12 at 15:41
  • Look for oom-killer in /var/log/syslog or /var/log/messages around the time of the 'crash'. – EightBitTony May 18 '12 at 16:16
  • Do you have access to the virtual console? That might at least give you some information e.g. kernel panic messages. And it might allow you to log in if it really is just slowed down because you don't need the additional resources for a network connection. – Bram May 18 '12 at 16:16
  • After you bounce the machine, take a loot at the `sar` logs (if you have that running). If you are lucky, you would see a spike in memory usage. Also what @EightBitTony said. – chutz May 18 '12 at 17:21
  • Nothing out of the ordinary in `/var/log` and friends. I do have access to the virtual console but it doesn't show anything but a login prompt. No sign of kernel panic. Installed `sar` and will def look after next "crash". Thanks! – Josh Johnson May 18 '12 at 19:48

1 Answers1

2

Another great way to track your memory usage would be to install sar on your Linux box. If you're using Debian it can be listed as sysstat. Using sar -r will give you a picture of what's going on in your memory and network statistics.

kenorb
  • 6,499
  • 2
  • 46
  • 54
Linztm
  • 391
  • 2
  • 7