0

Im sorry if this starts really vague, but i have no idea where to turn.

We have 4 dell R610 servers with 2 x 2.8GHZ Intel 5650 and 16GB Ram.

These servers just randomly reboot themselves

here is the last couple reboots of Server 1

reboot   system boot  3.11.0-15-generi Thu Jun 26 19:43 - 20:33  (00:49)
reboot   system boot  3.11.0-15-generi Tue Jun 24 01:15 - 20:33 (2+19:17)

The syslog and kern.log have nothing at the reboot point

kern.log

Jun 24 01:51:36 encoder1 kernel: [ 2319.677008] traps: vlc[29658] trap divide error ip:7fbefd013f3a sp:7fbede8bcd58 error:0 in libc-2.15.so[7fbefcfd8000+1b5000]
Jun 24 01:51:37 encoder1 kernel: [ 2320.681917] traps: vlc[29676] trap divide error `ip:7f5c23cdbf3a sp:7f5c0553dd58 error:0 in libc-2.15.so[7f5c23ca0000+1b5000]`
Jun 26 19:43:59 encoder1 kernel: imklog 5.8.6, log source = /proc/kmsg started.
Jun 26 19:43:59 encoder1 kernel: [    0.000000] Initializing cgroup subsys cpuset

syslog

Jun 26 19:37:52  snmpd[1613]: last message repeated 12 times
Jun 26 19:38:52  snmpd[1613]: last message repeated 2 times
Jun 26 19:43:59 encoder1 kernel: imklog 5.8.6, log source = /proc/kmsg started.
Jun 26 19:43:59 encoder1 rsyslogd: [origin software="rsyslogd" swVersion="5.8.6" x-pid="1180" x-info="http://www.rsyslog.com"] start

We have access to iDrac and the last message is this:

 "A fatal IO error detected on a component at"

I have no idea where to look. Server 1 has just been replaced with an identical server and this didn't fix the issue either

I am sorry that there is very little info here. Please ask for anything and i will get it to you as sson as i can.

Regards and thanks in advance

Liam
  • 109
  • 1

1 Answers1

0

I'd recommend using the Open Manage bootable CD to run diagnostics on the machines. This will give you a detailed report on health statuses from multiple devices and sensors on the server. If your servers are under warranty/support this is probably one of the first tools Dell would recommend.

The CD will boot into a CentOS environment. There should be an icon on the desktop to run the report. The ReadMe file in the link has more details on running the utilities.

Mike Naylor
  • 937
  • 1
  • 7
  • 15
  • We have run this program on all our servers, plus the company that sold us the servers, say they run it before they ship them to us. We have 5 identical servers now, and they all but one reboot, and they all pass the diagnostics cheers liam – Liam Jun 26 '14 at 20:10
  • are all 4 that are rebooting throwing the same errors? – Mike Naylor Jun 30 '14 at 16:32
  • the same error as above yes. but the error is empty, we have reverted to another piece of software to see if that is causing the issue – Liam Jul 02 '14 at 17:33