2

I'm by no means experienced with system administration. I have a Linode vm mostly for the fun of it, hosting a webserver (nginx -> nodejs) and a Minecraft server I share with some friends.

I regularly (1+/day) get e-mails notifying me about system boots, e.g:

Lassie initiated boot - Completed Tue, 29 Jan 2013 09:52:17 GMT

I've no idea why the machine is rebooting. Even more ignorant on my part, I don't know how to debug this issue. I've read /var/log/syslog around the reported boot times, and I can see the usual boot sequence dump, but nothing unusual prior to that.

How can I tackle the problem, and figure out what's going on?

salezica
  • 228
  • 1
  • 2
  • 8
  • 3
    Have you contacted Linode support? – Tom O'Connor Jan 29 '13 at 22:46
  • 1
    http://library.linode.com/monitoring-and-maintaining – Michael Hampton Jan 29 '13 at 22:53
  • Tom: no, not yet. I figured this would be something on my end Michael: awesome link! I'm reading it right now – salezica Jan 29 '13 at 23:11
  • Also take a look in /var/log/messages that can sometimes contain usefull information. The other way to troubleshoot, would be to stop all of the services bar one, and see if the box still re-boots, gradualy re-add services, untill you find the one that causes it to reboot. then check that softwares log file – Steve Jan 29 '13 at 23:16

2 Answers2

3

The kernel is running out of memory, and panicking. I forgot I had originally set panic_on_oom when the machine was only hosting the nginx+nodejs webserver.

I had actually configured the Java Minecraft instance to restricted memory usage via -Xmx and -Xms, but a more in-depth read on the memory usage of the JVM tells me this values are allocation-heap-only, and the actual memory usage can easily double the caps I set.

I'll temporarily disable panic_on_oom, and see what I can do to control the JVM's greed.

Thanks everyone for the help!

salezica
  • 228
  • 1
  • 2
  • 8
2

Can you install "atop" on the machine ? Launch the atop daemon, every 5 minutes (by default) it will save a logfile with values like running processes, memory used, CPU load, network load, HD load, etc. After your next reboot, open the log files with the atop utility ("-r" option) and replay the last hour ("t" and "T" keys), it should give you a clue about what is causing the problem.

Rosco
  • 455
  • 3
  • 6