2

I have a hard time identifying which drive is failing in my HP ProLiant DL360p Gen8. It has the following RAID controller: Smart Array P420i. I see tons of errors in dmesg:

[40425140.998750] sd 0:1:0:1: [sdb] Unaligned partial completion (resid=16312, sector_sz=512)
[40425140.998763] sd 0:1:0:1: [sdb] tag#597 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[40425140.998767] sd 0:1:0:1: [sdb] tag#597 Sense Key : 0x3 [current] 
[40425140.998770] sd 0:1:0:1: [sdb] tag#597 ASC=0x11 ASCQ=0x0 
[40425140.998775] sd 0:1:0:1: [sdb] tag#597 CDB: opcode=0x88 88 00 00 00 00 00 3c 17 fa f8 00 00 00 08 00 00
[40425140.998778] print_req_error: critical medium error, dev sdb, sector 1008204536
[40425141.001176] sd 0:1:0:1: [sdb] Unaligned partial completion (resid=16312, sector_sz=512)
[40425141.001186] sd 0:1:0:1: [sdb] tag#597 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[40425141.001189] sd 0:1:0:1: [sdb] tag#597 Sense Key : 0x3 [current] 
[40425141.001193] sd 0:1:0:1: [sdb] tag#597 ASC=0x11 ASCQ=0x0 
[40425141.001197] sd 0:1:0:1: [sdb] tag#597 CDB: opcode=0x88 88 00 00 00 00 00 3c 17 fa f8 00 00 00 08 00 00
[40425141.001199] print_req_error: critical medium error, dev sdb, sector 1008204536

I guess they mean that one of the drives in my sdb RAID array is failing. Unfortunately, running self-tests is impossible for some reason and I'm unable to see SMART attributes of individual drives. Here's the output of smartctl:

sudo smartctl -a /dev/sdb -d cciss,2                                                                                          130 ↵
[sudo] password for hypervisor: 
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.14.12-2-bfq-mq] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HITACHI
Product:              HUC10606 CLAR600
Revision:             C3B0
Compliance:           SPC-4
User Capacity:        600,000,000,000 bytes [600 GB]
Logical block size:   512 bytes
Rotation Rate:        10020 rpm
Form Factor:          2.5 inches
Logical Unit id:      0x5000cca03ca74bd0
Serial number:        PZJZ06RD
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Mon Dec 16 13:19:33 2019 CET
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Disabled or Not Supported

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     28 C
Drive Trip Temperature:        85 C

Manufactured in week 20 of year 2013
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  122
Specified load-unload count over device lifetime:  300000
Accumulated load-unload cycles:  2022
Elements in grown defect list: 0

Vendor (Seagate) cache information
  Blocks sent to initiator = 7447546408787247104

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   2241169767  2284343         0  2243454110   16825755     105146.414           0
write:         0   112305         0    112305     112316      60991.348           0
verify: 722111517   138380         0  722249897      90047      25945.352           0

Non-medium error count:        0

No self-tests have been logged

Here's my array setup:

sudo ssacli ctrl slot=0 pd all show detail

Smart Array P420i in Slot 0 (Embedded)

   Array A

      physicaldrive 1I:1:1
         Port: 1I
         Box: 1
         Bay: 1
         Status: OK
         Drive Type: Data Drive
         Interface Type: Solid State SATA
         Size: 500 GB
         Drive exposed to OS: False
         Logical/Physical Block Size: 512/512
         Firmware Revision: X61130WD
         Serial Number: 173566421696
         WWID: 3001438031683380
         Model: ATA     WDC WDS500G2B0A-
         SATA NCQ Capable: True
         SATA NCQ Enabled: True
         SSD Smart Trip Wearout: Not Supported
         PHY Count: 1
         PHY Transfer Rate: 6.0Gbps
         Drive Authentication Status: OK
         Carrier Application Version: 11
         Carrier Bootloader Version: 6
         Sanitize Erase Supported: True
         Unrestricted Sanitize Supported: True
         Shingled Magnetic Recording Support: None

      physicaldrive 1I:1:2
         Port: 1I
         Box: 1
         Bay: 2
         Status: OK
         Drive Type: Data Drive
         Interface Type: Solid State SATA
         Size: 500 GB
         Drive exposed to OS: False
         Logical/Physical Block Size: 512/512
         Firmware Revision: X61130WD
         Serial Number: 173566420063
         WWID: 3001438031683381
         Model: ATA     WDC WDS500G2B0A-
         SATA NCQ Capable: True
         SATA NCQ Enabled: True
         SSD Smart Trip Wearout: Not Supported
         PHY Count: 1
         PHY Transfer Rate: 6.0Gbps
         Drive Authentication Status: OK
         Carrier Application Version: 11
         Carrier Bootloader Version: 6
         Sanitize Erase Supported: True
         Unrestricted Sanitize Supported: True
         Shingled Magnetic Recording Support: None


   Array B

      physicaldrive 1I:1:3
         Port: 1I
         Box: 1
         Bay: 3
         Status: OK
         Drive Type: Data Drive
         Interface Type: SAS
         Size: 600 GB
         Drive exposed to OS: False
         Logical/Physical Block Size: 512/512
         Rotational Speed: 10000
         Firmware Revision: C3B0
         Serial Number: PZJZ06RD
         WWID: 5000CCA03CA74BD1
         Model: HITACHI HUC10606 CLAR600
         Current Temperature (C): 28
         PHY Count: 2
         PHY Transfer Rate: 6.0Gbps, Unknown
         Drive Authentication Status: OK
         Carrier Application Version: 11
         Carrier Bootloader Version: 6
         Sanitize Erase Supported: False
         Shingled Magnetic Recording Support: None

      physicaldrive 1I:1:4
         Port: 1I
         Box: 1
         Bay: 4
         Status: OK
         Drive Type: Data Drive
         Interface Type: SAS
         Size: 600 GB
         Drive exposed to OS: False
         Logical/Physical Block Size: 512/512
         Rotational Speed: 10000
         Firmware Revision: C3B0
         Serial Number: PZJE23KD
         WWID: 5000CCA03C887F15
         Model: HITACHI HUC10606 CLAR600
         Current Temperature (C): 30
         PHY Count: 2
         PHY Transfer Rate: 6.0Gbps, Unknown
         Drive Authentication Status: OK
         Carrier Application Version: 11
         Carrier Bootloader Version: 6
         Sanitize Erase Supported: False
         Shingled Magnetic Recording Support: None

      physicaldrive 2I:1:5
         Port: 2I
         Box: 1
         Bay: 5
         Status: OK
         Drive Type: Data Drive
         Interface Type: SAS
         Size: 600 GB
         Drive exposed to OS: False
         Logical/Physical Block Size: 512/512
         Rotational Speed: 10000
         Firmware Revision: C3B0
         Serial Number: PZJAL0JD
         WWID: 5000CCA03C83F969
         Model: HITACHI HUC10606 CLAR600
         Current Temperature (C): 27
         PHY Count: 2
         PHY Transfer Rate: 6.0Gbps, Unknown
         Drive Authentication Status: OK
         Carrier Application Version: 11
         Carrier Bootloader Version: 6
         Sanitize Erase Supported: False
         Shingled Magnetic Recording Support: None

      physicaldrive 2I:1:6
         Port: 2I
         Box: 1
         Bay: 6
         Status: OK
         Drive Type: Data Drive
         Interface Type: SAS
         Size: 600 GB
         Drive exposed to OS: False
         Logical/Physical Block Size: 512/512
         Rotational Speed: 10000
         Firmware Revision: C3B0
         Serial Number: PZK1EU5D
         WWID: 5000CCA03CABBAED
         Model: HITACHI HUC10606 CLAR600
         Current Temperature (C): 29
         PHY Count: 2
         PHY Transfer Rate: 6.0Gbps, Unknown
         Drive Authentication Status: OK
         Carrier Application Version: 11
         Carrier Bootloader Version: 6
         Sanitize Erase Supported: False
         Shingled Magnetic Recording Support: None

      physicaldrive 2I:1:7
         Port: 2I
         Box: 1
         Bay: 7
         Status: OK
         Drive Type: Data Drive
         Interface Type: SAS
         Size: 600 GB
         Drive exposed to OS: False
         Logical/Physical Block Size: 512/512
         Rotational Speed: 10000
         Firmware Revision: C3B0
         Serial Number: PZJJ37VD
         WWID: 5000CCA03C8E04A1
         Model: HITACHI HUC10606 CLAR600
         Current Temperature (C): 29
         PHY Count: 2
         PHY Transfer Rate: 6.0Gbps, Unknown
         Drive Authentication Status: OK
         Carrier Application Version: 11
         Carrier Bootloader Version: 6
         Sanitize Erase Supported: False
         Shingled Magnetic Recording Support: None

      physicaldrive 2I:1:8
         Port: 2I
         Box: 1
         Bay: 8
         Status: OK
         Drive Type: Data Drive
         Interface Type: SAS
         Size: 600 GB
         Drive exposed to OS: False
         Logical/Physical Block Size: 512/512
         Rotational Speed: 10000
         Firmware Revision: C3B0
         Serial Number: PZJAH46D
         WWID: 5000CCA03C83CE25
         Model: HITACHI HUC10606 CLAR600
         Current Temperature (C): 28
         PHY Count: 2
         PHY Transfer Rate: 6.0Gbps, Unknown
         Drive Authentication Status: OK
         Carrier Application Version: 11
         Carrier Bootloader Version: 6
         Sanitize Erase Supported: False
         Shingled Magnetic Recording Support: None

Is there a way to identify which drive is generating critical medium errors in the second array? I don't want to replace every single drive if only one of them is failing...

user3125731
  • 347
  • 5
  • 9
  • Do you have iLo on the box? Any light indicators on the actual disks? – Smock Dec 16 '19 at 13:49
  • I do. I can log in to iLO remote management utility. After going to `Information > System Information > Storage` I see a full report for each array and individual drives. The status of each drive is marked as "OK". It looks like [this](https://i.imgur.com/gQae0cO.png). I downloaded the Active Health System Log file but I'm unable to analyse it - there's no publicly available tool that allows opening `.ahs` files. – user3125731 Dec 16 '19 at 14:05
  • Is the firmware up to date on the disks / controller ? – Smock Dec 16 '19 at 14:12
  • It wasn't. I've just updated the iLO firmware from 2.5 (released in 2016) to 2.7 (May 14, 2019). After updating I checked every information I provided here (both in the question and in my previous comment). Nothing has changed - the drives' status is still marked as "OK" and the results of `ssacli` and `smartctl` are the same. I don't think that upgrading the firmware of individual drives ([firmware for hitachi](https://www.dell.com/support/home/pl/pl/plbsd1/drivers/driversdetails?driverid=r1myy)) is possible when they're behind this RAID controller. – user3125731 Dec 16 '19 at 14:54
  • Is there no update for SAS firmware/driver? (Sorry, not used that particular HP server, only 380/385's) – Smock Dec 16 '19 at 15:08
  • Could you please attach the the SSA/ADU report in reply? To generate the report - ssaducli -f adu-report.zip – srinivassc Apr 25 '20 at 14:12

0 Answers0