2

I've been using Debian on Dell servers for many years. For a long time I've been using the PERC H730P RAID controller, which is well supported by utilities like MegaCLI.

Recently I've bought a R440 Server with the new H750 Raid controller. I've initially been able to install Debian 11 on logical volumes created from the "BIOS" System Setup. But after a few minutes/hours of using the server to configure the software side, the disks suddenly disappeared.

At boot, Grub was still working, but the Debian boot sequence would stop, unable to find partitions.

The LifeCycleController wouldn't report any hardware issue. But the « Support Live Image » (a liveCD provided by Dell) would not see any storage controller.

The tech support told me that this new RAID controller is not compatible with Debian (nor CentOS 7, which is on the SLI liveCD), and I have to ask for a replacement by an older but compatible H730P.

I'm writting this because I couln't find anything online regarding Debian compatibility with recent Dell Raid controllers.

Hope this helps.

Update 2022-01-31

I've managed to reinstall a fresh Debian 11.2 without issues. I have then installed a backported 5.15.5 kernel (over the default 5.10.0). Everything seems fine.

But, when I install MegaCLI, the installation process freezes the whole server. After many Ctrl-C and a few minutes later, I get a shell back. "megaclisas-status" hangs on "-- Controller information --". After a round of Ctrl-C, I get the shell back.

If I try to purge the "megaclisas-status" and "megacli" packages, everything is frozen again.

I've just opened an issue on their tracker : https://github.com/eLvErDe/hwraid/issues/130

Update 2022-02-01:

My issue has been rejected, stating that this is a kernel issue.

I've reinstalled all the OS with the 5.15 kernel, and did a bunch of stress test and benchmarks. Everything seems to be OK.

Then I've installed the "megacli" tool and used it with a few commands ; no issue.

Then I've installed the "megaclisas-status" package, and the server freezes when installing the package. After a hard-reboot, I can use the system again, but the "megaclisas-status" package is not installed.

Update 2022-09-15:

On this day, still no luck!

Here is a dmesg output containing a lot of information about the hang : https://paste.evolix.org/?6667c4e24e7e8ab4#38LSGMsxTncyQYPzWErWpMRigTPLcLuP9cU8qt9HafMW

We've tried again with the latest backported kernel. Still hanging.

Some details :

# uname -a
Linux my-hostname 5.18.0-0.deb11.4-amd64 #1 SMP PREEMPT_DYNAMIC Debian 5.18.16-1~bpo11+1 (2022-08-12) x86_64 GNU/Linux

# cat /etc/debian_version 
11.5

# dpkg -l | grep mega
ii  megacli                            8.07.14-3+Debian.11.bullseye   amd64        LSI Logic MegaRAID SAS MegaCLI
ii  megaclisas-status                  0.18+Debian.11.bullseye        all          get RAID status out of LSI MegaRAID SAS HW RAID controllers

# dpkg -l | grep linux
ii  console-setup-linux                1.205                          all          Linux specific part of console-setup
ii  firmware-linux-free                20200122-1                     all          Binary firmware for various drivers in the Linux kernel
ii  libselinux1:amd64                  3.1-3                          amd64        SELinux runtime shared libraries
ii  linux-base                         4.6                            all          Linux image base package
ii  linux-image-5.10.0-13-amd64        5.10.106-1                     amd64        Linux 5.10 for 64-bit PCs (signed)
ii  linux-image-5.10.0-17-amd64        5.10.136-1                     amd64        Linux 5.10 for 64-bit PCs (signed)
ii  linux-image-5.18.0-0.deb11.4-amd64 5.18.16-1~bpo11+1              amd64        Linux 5.18 for 64-bit PCs (signed)
ii  linux-image-amd64                  5.18.16-1~bpo11+1              amd64        Linux for 64-bit PCs (meta-package)
ii  util-linux                         2.36.1-8+deb11u1               amd64        miscellaneous system utilities
ii  util-linux-locales                 2.36.1-8+deb11u1               all          locales files for util-linux


# dmidecode --type 1
# dmidecode 3.3
Getting SMBIOS data from sysfs.
SMBIOS 3.2 present.

Handle 0x0100, DMI type 1, 27 bytes
System Information
    Manufacturer: Dell Inc.
    Product Name: PowerEdge R350
    Version: Not Specified
    Serial Number: 339W7R3
    UUID: 4c4c4544-0033-3910-8057-b3c04f375233
    Wake-up Type: Power Switch
    SKU Number: SKU=NotProvided;ModelName=PowerEdge R350
    Family: PowerEdge

# lspci
00:00.0 Host bridge: Intel Corporation Device 4c53 (rev 01)
00:01.0 PCI bridge: Intel Corporation Device 4c01 (rev 01)
00:06.0 PCI bridge: Intel Corporation Device 4c09 (rev 01)
00:14.0 USB controller: Intel Corporation Device 43ed (rev 11)
00:14.2 RAM memory: Intel Corporation Device 43ef (rev 11)
00:16.0 Communication controller: Intel Corporation Device 43e0 (rev 11)
00:16.4 Communication controller: Intel Corporation Device 43e4 (rev 11)
00:17.0 SATA controller: Intel Corporation Device 43d2 (rev 11)
00:1b.0 PCI bridge: Intel Corporation Device 43c0 (rev 11)
00:1b.6 PCI bridge: Intel Corporation Device 43c6 (rev 11)
00:1c.0 PCI bridge: Intel Corporation Device 43b8 (rev 11)
00:1c.1 PCI bridge: Intel Corporation Device 43b9 (rev 11)
00:1f.0 ISA bridge: Intel Corporation Device 438d (rev 11)
00:1f.4 SMBus: Intel Corporation Device 43a3 (rev 11)
00:1f.5 Serial bus controller [0c80]: Intel Corporation Device 43a4 (rev 11)
01:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
01:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
02:00.0 RAID bus controller: Broadcom / LSI MegaRAID 12GSAS/PCIe Secure SAS39xx
05:00.0 PCI bridge: PLDA PCI Express Bridge (rev 02)
06:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. Integrated Matrox G200eW3 Graphics Controller (rev 04)
07:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
07:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe


### INFOS IDRAC

PERC H755 Adapter (Intégré) 
- Version micrologiciel  52.16.1-4405 

Version IDrac -> 5.10.50.00 
Version Bios  -> 1.3.3 
jlecour
  • 256
  • 3
  • 6
  • Browsing the web it seems that, yes, this HW RAID controller is still too new for debian. There's a chance it might work with Debian Testing though, maybe you'll try that before replacing the controller. – vautee Jan 25 '22 at 08:52
  • (I work for Dell) See https://www.dell.com/support/contents/en-us/article/product-support/self-support-knowledgebase/enterprise-resource-center/server-operating-system-support for supported operating systems on the PERC and here: https://www.dell.com/support/home/en-us/drivers/supportedos/poweredge-r440 for the R440 specifically. AFAIK Dell does not specifically support Debian anywhere. All our testing is done on Ubuntu since that's what the vast majority of people run. That said, I have yet to run into a scenario where something works on Ubuntu but won't work/can't be made to work on Debian. – Grant Curell Sep 16 '22 at 19:05
  • Unrelated, in your OP you mention that you bought an R440 w/H750 but your dmidecode output indicates you have an R350 with an H755? Was the OP a mistake? – Grant Curell Sep 16 '22 at 19:10
  • Hi Grant. Good catch on the model. We’ve tried on various servers and raid controllers. We will take a look at your links. Thanks – jlecour Sep 17 '22 at 20:51

2 Answers2

4

Update on Oct/5/2022:

a newer version dell perccli64 has been found in following link:

https://www.dell.com/support/home/zh-cn/drivers/driversdetails?driverid=36g6n

from which you can easily find the .deb directly from the tar.gz file, and this means you don't need to alien that from a rpm package, and the deb can be installed directly into debian 11 and Proxmox (which has been tested).


I went search Dell PERC H750 and found out that Dell provided PercCli instead, as RPM package for Linux, which means we can use the alien command to switch the package format from RPM to DEB. After installing the PercCli, this works quite well on Debian 11 using same syntax as Broadcom StorCli (which has different syntax against the traditional MegaCli).

https://www.dell.com/support/home/en-us/drivers/driversdetails?driverid=nf8g9

This has been tested on my new Dell R640 instance and proved working well.

oh, btw, check files using dpkg -c xxx.deb to get to know what's inside the deb package before (or after) the dpkg -i xxx.deb installation, otherwise it maybe not easy for you to find out where the binary is.

Actually it should be there as /opt/MegaRAID/perccli/perccli64 for your information.

  • 1
    and the traditional megacli can be installed on Debian 11 with H750 card (without system hang as mentioned in the question), but a few tests by me shows that it found no cards or drives. And the "megaclisas-status" does cause the system hang as mentioned above. – Ning Yu Fisher Mar 29 '22 at 12:29
  • 1
    We successfully installed the perccli64 program. It seems to work quite well, at least to query the controller for status information. We haven’t tried to change the controller configuration. We also still have to find a way to integrate this into our monitoring setup. – jlecour Sep 17 '22 at 20:54
0

Turns out that offending command which gets blocked is megacli -AdpAllInfo -a0 -NoLog where perccli64 will be blocked for this command as well as megacli.

One can hopefully patch megaclisas-status to avoid such command

diff --git a/megaclisas-status b/megaclisas-status
index 870e3a5..a9bc55b 100755
--- a/megaclisas-status
+++ b/megaclisas-status
@@ -27,7 +27,7 @@ nagiosbaddisk = 0
 
 # Sane defaults
 printarray = True
-printcontroller = True
+printcontroller = False
 debugmode = False
 notempmode = False
 totaldrivenumber = 0

We lost information about controller and BBU but megaclisas-status daemon can be used to monitor the status of the array without need for heavy refactoring for perccli64 (not talking about perccli64 /c0 show all segfaults for us anyway -- R540 + H750)