How to interpret conflicting status data between `smartctl` and `mdadm`?

Question

I have a HDD which is a member of a RAID10 array.
smartctl and mdadm give me conflicting indications of its status.

smartctl claims that the disk is FAILED:

$ sudo smartctl -H /dev/sdf
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-5.0.0-2.el7.elrepo.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
Failed Attributes:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   001   001   005    Pre-fail  Always   FAILING_NOW 2005

But mdadm claimed that the HDD is fine (active sync):

$ sudo mdadm --detail /dev/md0 
/dev/md0:
           Version : 1.2
     Creation Time : Fri Jan 31 12:52:57 2020
        Raid Level : raid10
        Array Size : 11720531968 (11177.57 GiB 12001.82 GB)
     Used Dev Size : 2930132992 (2794.39 GiB 3000.46 GB)
      Raid Devices : 8
     Total Devices : 8
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Thu Jun  4 00:29:34 2020
             State : clean 
    Active Devices : 8
   Working Devices : 8
    Failed Devices : 0
     Spare Devices : 0

            Layout : near=2
        Chunk Size : 512K

Consistency Policy : bitmap

              Name : 24port:0  (local to host 24port)
              UUID : 3d7b58f8:29553a3d:fbbc536e:8bb95424
            Events : 40771

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync set-A   /dev/sda1
       8      65       16        1      active sync set-B   /dev/sdr
       2       8       17        2      active sync set-A   /dev/sdb1
       3       8       33        3      active sync set-B   /dev/sdc1
       4       8       49        4      active sync set-A   /dev/sdd1
       5       8       65        5      active sync set-B   /dev/sde1
       6       8       81        6      active sync set-A   /dev/sdf1
       7       8       97        7      active sync set-B   /dev/sdg1

Whom should I believe?

score 2 · Answer 1 · answered Jun 05 '20 at 07:48

These are not conflicting. The test result is failed, the drive is not (yet). The drive is currently working, as indicated by the mdadm output, however, according to smartctl, it is dying and not much of it is left (having used almost all of its spare sectors to relocate data).

I would change that disk, and quickly if that array holds anything critical. Maybe it is a good idea to get some tool from the manufacturer and check the drive with that, in the case smartctl is wrong. Usually, it is not though, so your best bet is to change the disk while it works.

Lacek is totally right. SMART stats give you a leading indicator of failure. But md won't actually kick the drive out until it fails a read or a write. That day is coming soon for this drive. — Mike Andrews, Jun 05 '20 at 19:03

How to interpret conflicting status data between `smartctl` and `mdadm`?

1 Answers1