mdadm raid1 irregular degraded without reason

Question

My raid1 went to degrade state from time to time. Then my application crashes, because the raid is in read-only mode. After a reboot the raid is working fine again. Now I want to find out whats the root-cause for this error. Maybe someone has a tip for me where I can start to looking for.

thats the state after a reboot, that works fine for some days

root@node:~# sudo mdadm --detail /dev/md0
/dev/md0:
           Version : 1.2
     Creation Time : Tue May 17 21:43:06 2022
        Raid Level : raid1
        Array Size : 1953382464 (1862.89 GiB 2000.26 GB)
     Used Dev Size : 1953382464 (1862.89 GiB 2000.26 GB)
      Raid Devices : 2
     Total Devices : 2
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Thu Jun 30 11:05:30 2022
             State : active
    Active Devices : 2
   Working Devices : 2
    Failed Devices : 0
     Spare Devices : 0

Consistency Policy : bitmap

              Name : node:0  (local to host node)
              UUID : 449cfe85:fb2d3888:83ff4d80:3b4b007d
            Events : 26471

    Number   Major   Minor   RaidDevice State
       0       8        0        0      active sync   /dev/sda
       1       8       16        1      active sync   /dev/sdb

thats the state after a "unknown" event

root@node:/var/log# sudo mdadm --detail /dev/md0
/dev/md0:
           Version : 1.2
     Creation Time : Tue May 17 21:43:06 2022
        Raid Level : raid1
        Array Size : 1953382464 (1862.89 GiB 2000.26 GB)
     Used Dev Size : 1953382464 (1862.89 GiB 2000.26 GB)
      Raid Devices : 2
     Total Devices : 2
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Thu Jun 30 06:15:29 2022
             State : clean, degraded
    Active Devices : 1
   Working Devices : 1
    Failed Devices : 1
     Spare Devices : 0

Consistency Policy : bitmap

    Number   Major   Minor   RaidDevice State
       -       0        0        0      removed
       1       8       16        1      active sync   /dev/sdb

       0       8        0        -      faulty   /dev/sda

Sometimes its sdb and somtimes sda that failed. There is no pattern when it happen or that one of the both drives is mainly the faulty one. The SSDs are brand new and I have this behavior from the start. As I wrote, after a reboot the raid is fine again.

/etc/mdadm/mdadm.conf

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR xyz@gmx.de
MAILFROM xyz@gmx.de

# definitions of existing MD arrays

# This configuration was auto-generated on Thu, 21 Apr 2022 01:01:03 +0000 by mkconf

ARRAY /dev/md0 level=raid1 num-devices=2 metadata=1.2 spares=0 name=node:0 UUID=449cfe85:fb2d3888:83ff4d80:3b4b007d
   devices=/dev/sda,/dev/sdb

If it is not possible to find out the issue, is there a setting that prevent the raid to switch in read-only mode? I thought a Raid is a high-availability solution, but if one of two devices has an issue my applications are crashing because they are not able to put files on the disc.

System: Ubuntu 22.04 LTS Raid1 --> 2x Samsung 870 EVO 2.5" SSD - 2TB

cat /var/log/kern.log | grep md0

Jun 30 04:03:04 node kernel: [    4.970441] md/raid1:md0: not clean -- starting background reconstruction
Jun 30 04:03:04 node kernel: [    4.970446] md/raid1:md0: active with 2 out of 2 mirrors
Jun 30 04:03:04 node kernel: [    4.974972] md0: detected capacity change from 0 to 3906764928
Jun 30 04:03:04 node kernel: [    4.975043] md: resync of RAID array md0
Jun 30 04:03:04 node kernel: [    9.763722] EXT4-fs (md0): recovery complete
Jun 30 04:03:04 node kernel: [    9.768258] EXT4-fs (md0): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
Jun 30 04:03:04 node kernel: [   12.678657] md: md0: resync done.
Jun 30 06:14:53 node kernel: [ 7927.757074] md/raid1:md0: Disk failure on sda, disabling device.
Jun 30 06:14:53 node kernel: [ 7927.757074] md/raid1:md0: Operation continuing on 1 devices.
Jun 30 06:15:28 node kernel: [ 7962.903309] EXT4-fs warning (device md0): ext4_end_bio:342: I/O error 10 writing to inode 80478214 starting block 154449626)
Jun 30 06:15:28 node kernel: [ 7962.903312] Buffer I/O error on device md0, logical block 154449626
Jun 30 06:15:28 node kernel: [ 7962.903319] Buffer I/O error on dev md0, logical block 471859204, lost async page write
Jun 30 06:15:28 node kernel: [ 7962.903323] Buffer I/O error on dev md0, logical block 450888194, lost async page write
Jun 30 06:15:28 node kernel: [ 7962.903327] Buffer I/O error on dev md0, logical block 284164106, lost async page write
Jun 30 06:15:28 node kernel: [ 7962.903329] Buffer I/O error on dev md0, logical block 284164105, lost async page write
Jun 30 06:15:28 node kernel: [ 7962.903331] Buffer I/O error on dev md0, logical block 284164104, lost async page write
Jun 30 06:15:28 node kernel: [ 7962.903333] Buffer I/O error on dev md0, logical block 284164103, lost async page write
Jun 30 06:15:28 node kernel: [ 7962.903335] Buffer I/O error on dev md0, logical block 284164102, lost async page write
Jun 30 06:15:28 node kernel: [ 7962.903336] Buffer I/O error on dev md0, logical block 284164101, lost async page write
Jun 30 06:15:28 node kernel: [ 7962.903338] Buffer I/O error on dev md0, logical block 284164100, lost async page write
Jun 30 06:15:28 node kernel: [ 7962.903340] Buffer I/O error on dev md0, logical block 284164099, lost async page write
Jun 30 06:15:28 node kernel: [ 7962.903351] EXT4-fs warning (device md0): ext4_end_bio:342: I/O error 10 writing to inode 112728289 starting block 470803456)
Jun 30 06:15:28 node kernel: [ 7962.903352] Buffer I/O error on device md0, logical block 470803456
Jun 30 06:15:28 node kernel: [ 7962.903356] EXT4-fs warning (device md0): ext4_end_bio:342: I/O error 10 writing to inode 112728306 starting block 283967488)
Jun 30 06:15:28 node kernel: [ 7962.903357] EXT4-fs error (device md0): ext4_check_bdev_write_error:217: comm kworker/u64:2: Error while async write back metadata
Jun 30 06:15:28 node kernel: [ 7962.903372] Buffer I/O error on device md0, logical block 283967488
Jun 30 06:15:28 node kernel: [ 7962.903376] EXT4-fs warning (device md0): ext4_end_bio:342: I/O error 10 writing to inode 112728732 starting block 154806925)
Jun 30 06:15:28 node kernel: [ 7962.903378] Buffer I/O error on device md0, logical block 154806925
Jun 30 06:15:28 node kernel: [ 7962.903378] Buffer I/O error on device md0, logical block 283967489
Jun 30 06:15:28 node kernel: [ 7962.903379] Buffer I/O error on device md0, logical block 283967490
Jun 30 06:15:28 node kernel: [ 7962.903382] Aborting journal on device md0-8.
Jun 30 06:15:28 node kernel: [ 7962.903382] Buffer I/O error on device md0, logical block 283967491
Jun 30 06:15:28 node kernel: [ 7962.903385] Buffer I/O error on device md0, logical block 283967492
Jun 30 06:15:28 node kernel: [ 7962.903386] Buffer I/O error on device md0, logical block 283967493
Jun 30 06:15:28 node kernel: [ 7962.903387] Buffer I/O error on device md0, logical block 283967494
Jun 30 06:15:28 node kernel: [ 7962.903390] EXT4-fs error (device md0) in ext4_reserve_inode_write:5726: Journal has aborted
Jun 30 06:15:28 node kernel: [ 7962.903395] EXT4-fs error (device md0) in ext4_reserve_inode_write:5726: Journal has aborted
Jun 30 06:15:28 node kernel: [ 7962.903395] EXT4-fs error (device md0): ext4_dirty_inode:5922: inode #80478237: comm lnd: mark_inode_dirty error
Jun 30 06:15:28 node kernel: [ 7962.903397] EXT4-fs error (device md0): ext4_journal_check_start:83: comm tor: Detected aborted journal
Jun 30 06:15:28 node kernel: [ 7962.903398] EXT4-fs error (device md0) in ext4_dirty_inode:5923: Journal has aborted
Jun 30 06:15:28 node kernel: [ 7962.903399] EXT4-fs error (device md0): ext4_dirty_inode:5922: inode #80478214: comm lnd: mark_inode_dirty error
Jun 30 06:15:28 node kernel: [ 7962.903403] EXT4-fs error (device md0) in ext4_reserve_inode_write:5726: Journal has aborted
Jun 30 06:15:28 node kernel: [ 7962.903406] EXT4-fs error (device md0) in ext4_dirty_inode:5923: Journal has aborted
Jun 30 06:15:28 node kernel: [ 7962.903407] EXT4-fs error (device md0): mpage_map_and_submit_extent:2497: inode #80478214: comm kworker/u64:2: mark_inode_dirty error
Jun 30 06:15:28 node kernel: [ 7962.908521] EXT4-fs warning (device md0): ext4_end_bio:342: I/O error 10 writing to inode 80478214 starting block 154449627)
Jun 30 06:15:28 node kernel: [ 7962.908525] EXT4-fs (md0): I/O error while writing superblock
Jun 30 06:15:28 node kernel: [ 7962.908531] JBD2: Error -5 detected when updating journal superblock for md0-8.
Jun 30 06:15:28 node kernel: [ 7962.908542] EXT4-fs (md0): I/O error while writing superblock
Jun 30 06:15:28 node kernel: [ 7962.908544] EXT4-fs (md0): Remounting filesystem read-only
Jun 30 06:15:28 node kernel: [ 7962.908545] EXT4-fs (md0): failed to convert unwritten extents to written extents -- potential data loss!  (inode 80478214, error -30)
Jun 30 06:15:28 node kernel: [ 7962.908550] EXT4-fs (md0): I/O error while writing superblock
Jun 30 06:15:28 node kernel: [ 7962.908560] EXT4-fs (md0): I/O error while writing superblock
Jun 30 06:32:13 node kernel: [    5.076652] md/raid1:md0: not clean -- starting background reconstruction
Jun 30 06:32:13 node kernel: [    5.076658] md/raid1:md0: active with 2 out of 2 mirrors
Jun 30 06:32:13 node kernel: [    5.081202] md0: detected capacity change from 0 to 3906764928
Jun 30 06:32:13 node kernel: [    5.081262] md: resync of RAID array md0
Jun 30 06:32:13 node kernel: [    8.971854] EXT4-fs (md0): recovery complete

after some SMART and Badblock scans I found out that one of the devices has block errors:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   099   099   010    Pre-fail  Always       -       6
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       1123
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       13
177 Wear_Leveling_Count     0x0013   099   099   000    Pre-fail  Always       -       8
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   099   099   010    Pre-fail  Always       -       6
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   099   099   010    Pre-fail  Always       -       6
187 Reported_Uncorrect      0x0032   099   099   000    Old_age   Always       -       278
190 Airflow_Temperature_Cel 0x0032   054   035   000    Old_age   Always       -       46
195 Hardware_ECC_Recovered  0x001a   199   199   000    Old_age   Always       -       278
199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       0
235 Unknown_Attribute       0x0012   099   099   000    Old_age   Always       -       6
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       31580059046

SMART Error Log Version: 1
ATA Error Count: 278 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 278 occurred at disk power-on lifetime: 1122 hours (46 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 38 40 19 04 40  Error: UNC at LBA = 0x00041940 = 268608

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 38 40 19 04 40 07  43d+18:19:46.250  READ FPDMA QUEUED
  61 08 28 10 00 00 40 05  43d+18:19:46.250  WRITE FPDMA QUEUED
  47 00 01 30 06 00 40 04  43d+18:19:46.250  READ LOG DMA EXT
  47 00 01 30 00 00 40 04  43d+18:19:46.250  READ LOG DMA EXT
  47 00 01 00 00 00 40 04  43d+18:19:46.250  READ LOG DMA EXT

and badblockscan

root@node:/var/log# sudo badblocks -sv /dev/sda
Checking blocks 0 to 1953514583
Checking for bad blocks (read-only test): done
Pass completed, 0 bad blocks found. (0/0/0 errors)
root@node:/var/log# sudo badblocks -sv /dev/sdb
Checking blocks 0 to 1953514583
Checking for bad blocks (read-only test): 1063390992ne, 44:44 elapsed. (0/0/0 errors)
1063390993
1063390994ne, 44:45 elapsed. (2/0/0 errors)
1063390995
1063391056ne, 44:47 elapsed. (4/0/0 errors)
1063391057
1063391058
1063391059
1063395472ne, 44:48 elapsed. (8/0/0 errors)
1063397200ne, 44:49 elapsed. (9/0/0 errors)
1063397201ne, 44:50 elapsed. (10/0/0 errors)
...

What is the optimal procedure for replacing the disk in a raid1? I could reduce the disks to 1 mdadm --grow /dev/md0 --raid-devices=1 --force then replace the faulty disk and put it back to the raid mdadm --grow /dev/md0 --raid-devices=2 --add /dev/sdb but is that the right way?

I added the kern.log related to md0. dmesg shows the same, but less detailed. — Mark, Jun 30 '22 at 12:33
it looks for me that the drives go bad, a replacement is a good thing to do — djdomi, Jul 01 '22 at 04:58

Nikita Kipriyanov · Accepted Answer · 2022-07-02T04:43:53.347

0

One drive goes bad is not enough reason for I/O errors appearing on the MD RAID device. It should never display I/O errors, even if some component device fails, this is why we are using it anyway. So check SMART on both devices and also check RAM.

This is not the right way. You don't need to use grow mode.

You need to fail a bad disk if it didn't set failed flag by itself (mdadm -f /dev/mdX /dev/sdYZ) then remove it (mdadm -r /dev/mdX /dev/sdYZ).

When you have a new disk, partition it as you need and add into array (mdadm --add /dev/mdX /dev/sdYZ). Sync will begin automatically. You can see how it's going it with cat /proc/mdstat. It will cap the sync speed at 200 MB/s by default, you can lift this restriction by writing a desired value in KB/s into /sys/block/mdX/md/sync_speed_max.

Don't forget to install bootloader onto the new drive.

Monitor SMART with some automatic tool. Monitor RAID. Check your RAIDs monthly: echo check > /sys/block/mdX/md/sync_action; Debian does it automatically.

You absolutely should do all of this this. Don't think RAID will save you without monitoring.

edited Jul 02 '22 at 04:43

answered Jul 01 '22 at 10:47

Nikita Kipriyanov

10,947
2
24
45

1

Thank you for that. I will also scan the RAM. An 40 day old drive should not have "Reported_Uncorrect" Sectors, I think. I will have it replaced at the dealer. I checked the other drive (sda) and it has no badblocks and also the SMART -long do not show any errors. Do you have suggestions to my setup? The main objective is to ensure consistent data in recent state. Old data states are worthless and therefore backups are useless for this use case. If I add a third drive to the raid1, would it prevent the raid to go in read-only if a device failed? – Mark Jul 01 '22 at 12:59
Even two devices in RAID should be sufficient. It shouldn't go into "read-only" if one device is failed. There is quite strange problem you encountered, it shouldn't behave like this. Test your system. // RAID never guaranteed consistency; if the device successfully reads a mess in the place of data, it will happily return that mess. It only guarantees no I/O errors even if a certain number of component devices fail. If you want to have consistency, use something sophisticated, like ZFS, which uses data checksums to warn if some component successfully returns damaged (inconsistent) data. – Nikita Kipriyanov Jul 01 '22 at 13:48
One additional comment. Samsung 870 series known to have problems with Linux. See e.g. https://www.phoronix.com/scan.php?page=news_item&px=Samsung-860-870-More-Quirks . I'd test that carefully. I supported a setup like yours (RAID1 on two SSDs) in Linux in the past, we had no no apparent problems, but there were not Samsungs. Notice also this is a desktop drive. Don't expect it's long life in server applications; I've seen 870 wore down to 40% of lifetime in 1.5 years under MS SQL database load. – Nikita Kipriyanov Jul 01 '22 at 13:53
Oh crap, I read the article you mentioned --> "However, if using a Samsung 860/870 with an AMD chipset, it's even worse." perfect hit can you recommend a vendor/type for an 2TB SATA SSD that is inconspicuous for AMD chipsets under linux? – Mark Jul 01 '22 at 14:22

mdadm raid1 irregular degraded without reason

1 Answers1