Rebooted Ubuntu Server, didn't come back up. How do I investigate the cause?

Question

Leading off from previous question where I got excellent advice from LinuxDevOps, which I partially chose to ignore; the worse happened, and I don't know why, or how to investigate the cause.

I have a dedicated server running Ubuntu Server 13.10.

I had some kernel updates requiring a reboot, so I left it a week, and at 2am Saturday night, I rebooted using sudo reboot -r now. The server didn't come back up again. I couldn't connect via SSH anymore, or get a connection to Apache via HTTP either.

I had to have the datacentre come to the rescue as I was locked out. What they came back with, is that they just rebooted the server and all was fine, then I was back online.

So now, I've learned the hard way, and LinuxDevOps was right to bold point number 1 in his answer. I took it as an unlikely precaution that I could skip...

How can I investigate what went wrong?

Right now I cannot ever reboot the server again as I've nothing to suggest it won't be repeated.

Thanks.

Apr  6 02:20:24 kernel: imklog 5.8.11, log source = /proc/kmsg started.
Apr  6 02:20:34 kernel: imklog 5.8.11, log source = /proc/kmsg started.
Apr  6 03:38:13 kernel: imklog 5.8.11, log source = /proc/kmsg started.
Apr  6 03:38:13 kernel: [    0.000000] Initializing cgroup subsys cpuset
Apr  6 03:38:13 kernel: [    0.000000] Initializing cgroup subsys cpu
Apr  6 03:38:13 kernel: [    0.000000] Initializing cgroup subsys cpuacct
etc... normal startup

So this shows the first two entries look to me like an attempt to boot to a particular point, then an hour later when the datacentre rebooted it, a normal startup. This is from /var/log/kern.log.

Thanks.

This is why people get remote management systems like HP integrated Light-out and its remote console on industry-class systems. you don't need to go down to the datacenter. — mveroone, Apr 07 '14 at 09:29
@Kwaio Because: *Ubuntu*? Because: *Dedicated Server*? Because: *Supermicro*? — ewwhite, Apr 07 '14 at 14:45
@ewwhite I don't understand your comment, can you elaborate please? Thanks. — i-CONICA, Apr 07 '14 at 14:47
@i-CONICA Well, I often see Ubuntu on servers hosted with dedicated hosts/hosting facilities where the end-user doesn't have the tools available for [**out-of-band management access**](http://en.wikipedia.org/wiki/Out-of-band_management) to their systems. — ewwhite, Apr 07 '14 at 14:50
Yes that's me. It's a very powerful fully dedicated headless server running Ubuntu Server plugged into a rack in a datacentre thousands of miles away via a 1Gb/s pipe. :) — i-CONICA, Apr 07 '14 at 14:56
I'm sad that I was right in this case, if it's any consolation this happens a lot. Has the server booted with the old or the new kernel? kernel-hardware issues are hard to debug and messages from logs are cryptic at best, unless a kernel change is really needed I'd say keep back its updates. — LinuxDevOps, Apr 07 '14 at 15:30
It's booted with the old, I assume as the same 3 kernel related updates are still "held back", so it's waiting for a reboot again. The server is purring along happily, so I daren't rock the boat and want to adhere to the "don't fix what ain't broke" mantra by just leaving it alone, but I don't like the thought that I can't reboot the server successfully if and when I next ever need to. Can I ignore these updates or remove them from pending? As right now, they're blocking any further regular updates to the AMP part of the stack! :) Thanks. — i-CONICA, Apr 07 '14 at 15:40
hold back a package with `sudo apt-mark hold package` or `echo package hold | sudo dpkg --set-selections` — LinuxDevOps, Apr 07 '14 at 16:09

score 0 · Answer 1 · answered Apr 07 '14 at 09:04

0

About the only thing you can do now is look at your logs for information that may relate to the problems you had. Erm that's it.

answered Apr 07 '14 at 09:04

user9517

115,471
20
215
297

Which logs might I look at? Other than mysql, php and apache error logs, I'm pretty clueless. Thanks. – i-CONICA Apr 07 '14 at 09:06
@i-CONICA: erm whatever you have in /var/log/... https://help.ubuntu.com/community/LinuxLogFiles – user9517 Apr 07 '14 at 09:09
Hi, I've already looked at /var/log/kern.log but it doesn't tell me anything. It shows 2 entries that look like the start of a startup, then an hour later (after the downtime, when the datacentre rebooted) a normal startup. See edited question. – i-CONICA Apr 07 '14 at 09:19
@i-CONICA: If your logs don't show anything useful then your out of luck but I imagine there are more logs for you to read through than kern.log. – user9517 Apr 07 '14 at 09:22

Rebooted Ubuntu Server, didn't come back up. How do I investigate the cause?

1 Answers1