0

Our high-load server is in kernel panic. How can we identify the cause and correct it?

It is a Dell PowerEdge R940 with 256GB ram and 20TB disk array with the root in a SSD disk. Linux CentOs 7 system. Trying to reboot I just can get the next two kernel panic messages: kernel panic 1 kernel panic 2

I will appreciate any help.

1 Answers1

0

legally obligatory message: I work for Dell

Is is possible? Yes. Is it worth it: If you have a Linux nerd on hand… maybe. Sometimes these things are easy kills, sometimes they are rat holes.

Operationally, if only the drive with the OS is effected but the RAID is fine, just reinstall the OS. It is not likely that it is worth it to spend a bunch of time troubleshooting a kernel panic. Moreover, it’s worth getting off of CentOS 7 which is long dead as far as effective updates. They’re on RHEL 9.1. CentOS 7 is nearly a decade old at this point.

Grant Curell
  • 1,043
  • 6
  • 19
  • Yes but I need it compatible with a Lustre system we installed in the cluster 5 years ago. OTOH, the PowerEdge R940 has two M.2 ports; should I try installing the second one to install a second operative system? – user2309000 Mar 22 '23 at 19:09
  • Ah yes, HPC and its never ending quest to go un-updated . My overly forthright answer is that if you're at the level of desperate that you're posting to Serverfault, yes, any direction you're going to go is going to require you to find a way to reinstall the OS. It's unlikely you'll be able to troubleshoot a kernel panic over a forum. That said, I would also strongly recommend, even if it's an HPC system, you consider paying the technical debt sooner rather than later - CentOS 7 goes completely EOL in a year and it's already effectively dead. The only way it makes sense to me to stay on – Grant Curell Mar 22 '23 at 19:27
  • CentOS 7 is if you are *extremely* invested in very specific optimization to the given configuration. From a maintenance perspective though it isn't going to be fun if you have any package dependencies - particularly given that many HPC workloads depend on libraries that require newer kernels. – Grant Curell Mar 22 '23 at 19:29