-1

Leading off from previous question where I got excellent advice from LinuxDevOps, which I partially chose to ignore; the worse happened, and I don't know why, or how to investigate the cause.

I have a dedicated server running Ubuntu Server 13.10.

I had some kernel updates requiring a reboot, so I left it a week, and at 2am Saturday night, I rebooted using sudo reboot -r now. The server didn't come back up again. I couldn't connect via SSH anymore, or get a connection to Apache via HTTP either.

I had to have the datacentre come to the rescue as I was locked out. What they came back with, is that they just rebooted the server and all was fine, then I was back online.

So now, I've learned the hard way, and LinuxDevOps was right to bold point number 1 in his answer. I took it as an unlikely precaution that I could skip...

How can I investigate what went wrong?

Right now I cannot ever reboot the server again as I've nothing to suggest it won't be repeated.

Thanks.

Apr  6 02:20:24 kernel: imklog 5.8.11, log source = /proc/kmsg started.
Apr  6 02:20:34 kernel: imklog 5.8.11, log source = /proc/kmsg started.
Apr  6 03:38:13 kernel: imklog 5.8.11, log source = /proc/kmsg started.
Apr  6 03:38:13 kernel: [    0.000000] Initializing cgroup subsys cpuset
Apr  6 03:38:13 kernel: [    0.000000] Initializing cgroup subsys cpu
Apr  6 03:38:13 kernel: [    0.000000] Initializing cgroup subsys cpuacct
etc... normal startup

So this shows the first two entries look to me like an attempt to boot to a particular point, then an hour later when the datacentre rebooted it, a normal startup. This is from /var/log/kern.log.

Thanks.

i-CONICA
  • 648
  • 1
  • 9
  • 22
  • This is why people get remote management systems like HP integrated Light-out and its remote console on industry-class systems. you don't need to go down to the datacenter. – mveroone Apr 07 '14 at 09:29
  • @Kwaio Because: *Ubuntu*? Because: *Dedicated Server*? Because: *Supermicro*? – ewwhite Apr 07 '14 at 14:45
  • @ewwhite I don't understand your comment, can you elaborate please? Thanks. – i-CONICA Apr 07 '14 at 14:47
  • @i-CONICA Well, I often see Ubuntu on servers hosted with dedicated hosts/hosting facilities where the end-user doesn't have the tools available for [**out-of-band management access**](http://en.wikipedia.org/wiki/Out-of-band_management) to their systems. – ewwhite Apr 07 '14 at 14:50
  • Yes that's me. It's a very powerful fully dedicated headless server running Ubuntu Server plugged into a rack in a datacentre thousands of miles away via a 1Gb/s pipe. :) – i-CONICA Apr 07 '14 at 14:56
  • I'm sad that I was right in this case, if it's any consolation this happens a lot. Has the server booted with the old or the new kernel? kernel-hardware issues are hard to debug and messages from logs are cryptic at best, unless a kernel change is really needed I'd say keep back its updates. – LinuxDevOps Apr 07 '14 at 15:30
  • It's booted with the old, I assume as the same 3 kernel related updates are still "held back", so it's waiting for a reboot again. The server is purring along happily, so I daren't rock the boat and want to adhere to the "don't fix what ain't broke" mantra by just leaving it alone, but I don't like the thought that I can't reboot the server successfully if and when I next ever need to. Can I ignore these updates or remove them from pending? As right now, they're blocking any further regular updates to the AMP part of the stack! :) Thanks. – i-CONICA Apr 07 '14 at 15:40
  • 1
    hold back a package with `sudo apt-mark hold package` or `echo package hold | sudo dpkg --set-selections` – LinuxDevOps Apr 07 '14 at 16:09

1 Answers1

0

About the only thing you can do now is look at your logs for information that may relate to the problems you had. Erm that's it.

user9517
  • 115,471
  • 20
  • 215
  • 297
  • Which logs might I look at? Other than mysql, php and apache error logs, I'm pretty clueless. Thanks. – i-CONICA Apr 07 '14 at 09:06
  • @i-CONICA: erm whatever you have in /var/log/... https://help.ubuntu.com/community/LinuxLogFiles – user9517 Apr 07 '14 at 09:09
  • Hi, I've already looked at /var/log/kern.log but it doesn't tell me anything. It shows 2 entries that look like the start of a startup, then an hour later (after the downtime, when the datacentre rebooted) a normal startup. See edited question. – i-CONICA Apr 07 '14 at 09:19
  • @i-CONICA: If your logs don't show anything useful then your out of luck but I imagine there are more logs for you to read through than kern.log. – user9517 Apr 07 '14 at 09:22