How to debug Linux server reboots?

Question

I have a Debian 10 server that keeps rebooting. journalctl offers possibility to list last boots:

journalctl --list-boots
-6 1ee519dc5bc24e88af75cc609ee32093 Mon 2023-02-06 21:02:02 UTC—Sun 2023-02-12 17:23:28 UTC
-5 bb25fc752ac1428abb87bab15a3cea8b Sun 2023-02-12 17:26:04 UTC—Sun 2023-02-12 17:34:59 UTC
-4 91245b74acdc4c7086ebc4a626d55dcc Sun 2023-02-12 17:37:39 UTC—Sun 2023-02-12 21:48:10 UTC
-3 e3978f5222164454be6ebcd12a1ea65b Sun 2023-02-12 21:50:48 UTC—Sun 2023-02-12 22:38:56 UTC
-2 b3bc3015a73a4661af9f2c277e9bc03d Sun 2023-02-12 22:42:02 UTC—Mon 2023-02-13 02:02:07 UTC
-1 57f4a16489904888acc285ed090afaa7 Mon 2023-02-13 02:04:40 UTC—Mon 2023-02-13 04:04:46 UTC
 0 28efdbf5275f4320ad11f3075b66aa95 Mon 2023-02-13 04:07:21 UTC—Mon 2023-02-13 08:33:09 UTC

However it's not clear where the system was rebooted by user, kernel crashed or the power was cut. Is there any tool that would provide such output?

what is the infra you are running your OS on ? virtual ? dedicated server? — Zareh Kasparian, Feb 13 '23 at 08:54
If the server crashes, last logs could be lost anyway. Sending them through UDP (using kernel module netconsole) to be stored on a remote system might keep more logs. — A.B, Feb 13 '23 at 12:35

score 1 · Accepted Answer · answered Feb 15 '23 at 15:46

I wrote a simple tool in bash to collect automatically additional information about reboots. The script uses internally journalctl, so it might work on any Linux distribution using Systemd.

The idea is simple, for each session we want to check the logs for additional information, check for known entries:

system received SIGTERM
asked to shutdown
SEGFAULT
kernel BUG

Confirming a crash is complicated. That's why the some lines are marked as CRASH?. Which means that such log suddenly ends without recognized error message. In some cases a SEGFAULT might get logged, sometimes not.

This might help the operator to focus on boot sessions with suspicious entries.

$ crashctl
Distribution        : Debian GNU/Linux 10 (buster)
Kernel              : 4.19.0-23-amd64 #1 SMP Debian 4.19.269-1 (2022-12-20)
Current boot        : 606aaecb-b14d-4bbc-9598-b6c60233a888
Scaled load         : 0.04 0.01 0.00 
System installed    : Tue Jan  3 09:26:13 UTC 2023
System started      : Mon Feb  6 03:11:44 CET 2023
Uptime              : up 7 days
Running processes   : 384
kdump               : current state   : ready to kdump
Boot First message             Last message             Uptime       Reboot/Crash
-------------------------------------------------------------------------------------
-11  2022-12-05 20:43:53 UTC   2022-12-05 20:52:00 UTC  0d 00:08:07  reboot (SIGTERM)
-10  2022-12-06 07:56:01 UTC   2022-12-06 15:14:36 UTC  0d 07:18:35  CRASH?
-9   2022-12-07 12:28:07 UTC   2022-12-10 16:33:43 UTC  3d 04:05:36  reboot (SIGTERM)
-8   2022-12-12 08:56:05 UTC   2022-12-18 08:18:40 UTC  5d 23:22:35  CRASH?
-7   2022-12-18 08:32:27 UTC   2022-12-25 10:54:03 UTC  7d 02:21:36  reboot (SIGTERM)
-6   2022-12-28 10:51:54 UTC   2022-12-29 12:12:32 UTC  1d 01:20:38  Power key pressed, but ignored
-5   2023-01-02 08:45:54 UTC   2023-01-06 08:05:01 UTC  3d 23:19:07  CRASH?
-4   2023-01-06 10:07:00 UTC   2023-01-12 10:01:25 UTC  5d 23:54:25  Power key pressed, but ignored
-3   2023-01-12 10:04:36 UTC   2023-01-28 14:07:19 UTC  16d 04:02:43 reboot (SIGTERM)
-2   2023-01-30 08:43:42 UTC   2023-01-31 07:27:26 UTC  0d 22:43:44  reboot (SIGTERM)
-1   2023-02-02 12:41:51 UTC   2023-02-04 13:16:19 UTC  2d 00:34:28  reboot (SIGTERM)
0    2023-02-06 03:12:01 UTC   2023-02-13 18:17:52 UTC  7d 15:05:51  running

asktyagi · Answer 2 · 2023-02-13T09:37:58.453

0

Please try below sequence.

Check last or journalctl --list-boots command output and get the date and time for any reboot(Keep date time format same while searching).
Open /var/log/messages file and search for same date time. If logs rotated check in old logs.
Check what is happening before reboot.
If you see stopping service statement it mean server is rebooted normally either by user or scheduled or via console(in case of cloud instance).
In case of crash you will see crash trace.

edited Feb 13 '23 at 09:37

answered Feb 13 '23 at 08:58

asktyagi

2,860
2
8
25

I don't find `last` very useful for me. It shows e.g. a from a month ago `reboot system boot 4.19.0-23-amd64 Tue Jan 3 10:26 still running` as "still running". Meanwhile the system crashed at least 5 times. – Tombart Feb 13 '23 at 09:06
are you sure there is nothing wrong with the physical server itself. check the logs through the ilo if running on the HP server – Zareh Kasparian Feb 13 '23 at 09:11
You can use `journalctl --list-boots` output to get last reboot time, rest steps will be same, do you find something interesting in message logs? – asktyagi Feb 13 '23 at 09:29
Yes, there's definitely something wrong with the server. I know about `journalctl --list-boots`, the output is in the question. Just the provided information is not sufficient. – Tombart Feb 13 '23 at 11:42

How to debug Linux server reboots?

2 Answers2