0

One server reboots every few days seemingly completely randomly. There is no mention in any log of any errors prior to such a reboot. For example, the reboot happens in between these 2 messages from /var/log/messages and journalctl:

Mar 13 11:25:01 server something: some action 
Mar 13 14:33:00 server rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="2460" x-info="http://www.rsyslog.com"] start

The only clue I could find is after the reboot, somewhere in the start up it spouts out this IPMI line:

Mar 13 14:33:00 server kernel: [   24.621566] Copyright (C) 2004 MontaVista Software - IPMI Powerdown via sys_reboot.

All the other IPMI and other messages seem to be normal. The motherboard of this server is an Intel S5000PSL. Some outputs from ipmitool:

# ipmitool mc watchdog get
Watchdog Timer Use:     BIOS FRB2 (0x01)
Watchdog Timer Is:      Stopped
Watchdog Timer Actions: Hard Reset (0x01)
Pre-timeout interval:   0 seconds
Timer Expiration Flags: 0x00
Initial Countdown:      0 sec
Present Countdown:      0 sec

Does the IPMI keep other logs that I don't know of and if so, how do I access them?

  • Try to enable maximum power in BIOS settings and disable processor c-states and other energy saving. It's just simple recommendation to find decision of this trouble. – Mikhail Khirgiy Mar 13 '17 at 16:06

1 Answers1

0

I had exactly same problem, next step was logging over serial console to another machine without any result. Fianl solution was support request to vendor. I got new motherboar, from then everything is OK

Quantim
  • 1,358
  • 10
  • 15