1

I have an Ubuntu 18 server using a software RAID-1 array, and smartctl is indicating that my /dev/sda is failing. Both Throughput_Performance and Reallocated_Sector_Ct are listed as "FAILING_NOW".

However, this has not triggered a RAID failure, as /proc/mdstat still indicates both drives are fine.

Unfortunately, even though smartctl shows my /dev/sdb is perfectly fine, the error on /dev/sda has somehow caused my filesystem to become readonly. Any attempt to write or delete files reports some error like:

rm: cannot remove '<somefile>': Read-only file system

I understand Linux does this when it detects a possible drive failure, in order to prevent writes causing further corruption.

Yet I thought the whole point of RAID-1, and the reason why I used it, was that it avoids a single drive failure from corrupting the filesystem by mirroring all data between two drives? A drive failure should stop the mirroring, but still allow the OS full read/write access to the remaining drive, right?

I'm going to replace the failing drive, but how to I fix the filesystem in the meantime, and prevent a single drive failure from breaking my RAID-1 array?

Cerin
  • 3,600
  • 19
  • 61
  • 79

1 Answers1

2

If a drive is corrupting data, the MD layer has no idea of that, and it may return or store invalid data, which it then in turn may use to write corrupted data back to the other disk; if it gets the wrong idea of the file system structure, it will make wrong decisions.

The MD layer doesn't read data from multiple drives and compare, nor is not too smart about drive failure. It will incessantly keep trying to use malfunctioning drives. That's why I monitor for variations of 'ata exception' in my kernel log, because those indicate that a drive will fail long before anything else. I then just mdadm --manage /dev/mdX --remove /dev/xxx the drive and add another one with mdadm --manage /dev/mdX --add /dev/xxx.

In the mean time, you may want to remove the failing drive, take a full dd clone of the /dev/mdX device as backup (dd if=/dev/mdX of=/some/file/on/some/other/device bs=4M) (or write to stdout and pipe over ssh) and from a sysrescue CD/USB stick, do an fsck -f -C /dev/mdX, multiple times (it has proven necessary to do so).

(it may instead be smarter to back up the component drive, not the mirror)

Halfgaar
  • 8,084
  • 6
  • 45
  • 86