mdadm RAID5 random read errors. Dying disk?

Question

First the long story:
I have a RAID5 with mdadm on Debian 9. The Raid has 5 Disks, each 4TB of size. 4 of them are HGST Deskstar NAS, and one that came later is a Toshiba N300 NAS.

In the past days I noticed some read errors from that Raid. For example I had a 10GB rar archive in multiple parts. When I try to extract I get CRC errors on some of the parts. If I try it a second time, I get theses errors on other parts. That also happens with Torrents and a re-chack after download.

After a reboot my BIOS noticed me that the S.M.A.R.T status of a HGST drive on SATA Port 3 is bad. smartctl had sayed to me that there are DMA CRC errors, but claims that the Drive is OK.

Another reboot later, I can't see the crc errors in the smart anymore. But now I get this output

smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-4-amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
Failed Attributes:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   001   001   005    Pre-fail  Always   FAILING_NOW 1989

As the HGST aren't aviable for normale prices anymore, I bought another Toshiba N300 to replace the HGST. Both are labeled as 4TB. I tryed to make a Partition of the exact same size but it didn't worked. The partition programm claimed that my number is too big (I tried it with bytes and sectors). So I just made the Partition as big as posible. But now it looks like it is the same size, I'm a bit confused.

sdc is the old, and sdh is the new one

Disk /dev/sdc: 3,7 TiB, 4000787030016 bytes, 7814037168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 4CAD956D-E627-42D4-B6BB-53F48DF8AABC

Device     Start        End    Sectors  Size Type
/dev/sdc1   2048 7814028976 7814026929  3,7T Linux RAID


Disk /dev/sdh: 3,7 TiB, 4000787030016 bytes, 7814037168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 3A173902-47DE-4C96-8360-BE5DBED1EAD3

Device     Start        End    Sectors  Size Type
/dev/sdh1   2048 7814037134 7814035087  3,7T Linux filesystem

Currently I have added the new one as a spare disk. The RAID is still working with the old Drive. I still have some read errors, especially on big files.

This is how my RAID Currently looks:

/dev/md/0:
        Version : 1.2
  Creation Time : Sun Dec 17 22:03:20 2017
     Raid Level : raid5
     Array Size : 15627528192 (14903.57 GiB 16002.59 GB)
  Used Dev Size : 3906882048 (3725.89 GiB 4000.65 GB)
   Raid Devices : 5
  Total Devices : 6
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Sat Jan  5 09:48:49 2019
          State : clean
 Active Devices : 5
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 512K

           Name : SERVER:0  (local to host SERVER)
           UUID : 16ee60d0:f055dedf:7bd40adc:f3415deb
         Events : 25839

    Number   Major   Minor   RaidDevice State
       0       8       49        0      active sync   /dev/sdd1
       1       8       33        1      active sync   /dev/sdc1
       3       8        1        2      active sync   /dev/sda1
       4       8       17        3      active sync   /dev/sdb1
       5       8       80        4      active sync   /dev/sdf

       6       8      113        -      spare   /dev/sdh1

And the disk structure is this

NAME                       MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda                          8:0    0   3,7T  0 disk
└─sda1                       8:1    0   3,7T  0 part
  └─md0                      9:0    0  14,6T  0 raid5
    └─storageRaid          253:4    0  14,6T  0 crypt
      └─vg_raid-raidVolume 253:5    0  14,6T  0 lvm   /media/raidVolume
sdb                          8:16   0   3,7T  0 disk
└─sdb1                       8:17   0   3,7T  0 part
  └─md0                      9:0    0  14,6T  0 raid5
    └─storageRaid          253:4    0  14,6T  0 crypt
      └─vg_raid-raidVolume 253:5    0  14,6T  0 lvm   /media/raidVolume
sdc                          8:32   0   3,7T  0 disk
└─sdc1                       8:33   0   3,7T  0 part
  └─md0                      9:0    0  14,6T  0 raid5
    └─storageRaid          253:4    0  14,6T  0 crypt
      └─vg_raid-raidVolume 253:5    0  14,6T  0 lvm   /media/raidVolume
sdd                          8:48   0   3,7T  0 disk
└─sdd1                       8:49   0   3,7T  0 part
  └─md0                      9:0    0  14,6T  0 raid5
    └─storageRaid          253:4    0  14,6T  0 crypt
      └─vg_raid-raidVolume 253:5    0  14,6T  0 lvm   /media/raidVolume
sdf                          8:80   1   3,7T  0 disk
└─md0                        9:0    0  14,6T  0 raid5
  └─storageRaid            253:4    0  14,6T  0 crypt
    └─vg_raid-raidVolume   253:5    0  14,6T  0 lvm   /media/raidVolume
sdh                          8:112  1   3,7T  0 disk
└─sdh1                       8:113  1   3,7T  0 part
  └─md0                      9:0    0  14,6T  0 raid5
    └─storageRaid          253:4    0  14,6T  0 crypt
      └─vg_raid-raidVolume 253:5    0  14,6T  0 lvm   /media/raidVolume

I'm a bit confused that the spare disk (sdh) is already in the crypt volume.

Questions:
Under what criteria will mdadm say that a disk has Failed?
Can the random read errors come from one broken Disk?
Dosn't detect the raid it when a disk sends the wrong data?
Is it dangerouse to mark a disk manually as failed when the spare Disk has not the exact same size?

except of being "too broad" it's also an off-topic ("consumer workstations or networking (which belong on our sister site, Super User)") — https://serverfault.com/help/on-topic — poige, Jan 05 '19 at 18:35
I had presented the same problem, I've checked disks with smartctl -a long and disk it's ok some problems but corrected (less than 7), but I can't build the array (raid 6). I can force with this command but if the disk fails, would be a big problem. — Fire_Wolf.cl, Sep 11 '20 at 01:05

Halfgaar · Accepted Answer · 2019-01-05T22:03:38.263

5

MD raid is far too conservative with kicking out disks, in my opinion. I always watch for ATA exceptions in the syslog/dmesg (I set rsyslog to notify me on those).

I must say I am surprised that you get errors on the application level. RAID5 should use the parity information to detect errors (edit, apparently, it doesn't; only during verification). Having said that, whether the disk is the cause or not, it's bad. Nearly 2000 reallocated sectors is really bad.

Partitions can be bigger, otherwise you can't add them as spare either, but to be sure everything is fine, you can clone partition tables using fdisk, sfdisk and gdisk. You have GPT, so let's use its backup feature. If you do gdisk /dev/sdX, you can use b to back the partition table up to disk. Then, on the new disk, gdisk /dev/sdY, you can use r for recovery options, then l to load the backup. Then you should have an identical partition and all mdadm --manage --add commands should work. (you will need to take out the new disk from the array before changing the partition table)

I actually tend to keep those backup partition tables around on servers. It makes for fast disk replacements.

And, a final piece of advice: don't use RAID5. RAID5 with such huge disks is flaky. You should be able to add a disk and dynamically migrate to RAID6. Not sure how from the top of my head, but you can Google that.

edited Jan 05 '19 at 22:03

answered Jan 05 '19 at 13:17

Halfgaar

8,084
6
45
86

I already added the new disk (sdh1) with mdadm --manage --add and it is there as as spare disk now. Can i Just mark the old device as failed? Or should I remove the spare again, and vreate a Partition with a gdisk backup? I read somewhere that I need to set a new uuid if I install the parition backup. – kevinq Jan 05 '19 at 13:39
1

“RAID5 should use the parity information to detect errors. ” — what do you mean “should”? Linux software RAID doesn’t work that way – poige Jan 05 '19 at 14:46
1

@kevinq, if will work if you mark the old device as failed. You will just have to remember that in the future, new disks only need to have partitions has big as the smallest drive. As for the UUID, I think you're right. If you have a new enough `sfdisk` (with gpt support), you can see the UUIDs with `sfdisk -d /dev/sdX`. I used to clone MBR partition tables with `sfdisk -d /dev/sda | sfdisk /dev/sdb`, but with UUIDs, I need to rethink this. – Halfgaar Jan 05 '19 at 15:57
@poige can you elaborate? You're saying (Linux software) RAID5 doesn't use the parity to verify what it just read? If not, I stand corrected, and what the OP sees is not unexpected, though the drive should have given an error instead of return wrong data. – Halfgaar Jan 05 '19 at 16:02
I'm saying what I'm saying, the q-n is why are you telling before prior checking(?) – poige Jan 05 '19 at 18:23
1

As far as I'm aware, *nobody's* RAID uses the parity information to verify the read. The closest I'm aware of is ZRAID and BTRFS RAID, which use checksums to verify the read, and then the parity information to recover if the checksum fails. – Mark Jan 06 '19 at 00:40

score 4 · Answer 2 · answered Jan 05 '19 at 14:58

4

it’s pretty common to have cron task initiating parity mismatch checks. i’m pretty sure debian 9 does it by default when mdadm package installs and hence your system’s logs would have reports in regards.

Besides if your system’s RAM fails it might be the primary reason

answered Jan 05 '19 at 14:58

poige

9,448
2
25
52

To test the system RAM, the OP can run memtest86. – Halfgaar Jan 05 '19 at 15:43
whatever, that's another story – poige Jan 05 '19 at 18:20

mdadm RAID5 random read errors. Dying disk?

2 Answers2