2

I have 4 x 8TB configured on RAID 5, as it is if one disk faults the system can survive. But if more than 1 disk faults, the system will be destructed. How can I check if one disk is already faulted so that I can back up?

That does not deliver that information:

fdisk -l
Andrew Schulman
  • 8,811
  • 21
  • 32
  • 47
Van Gogh
  • 21
  • 1
  • If you have linux sw raid, then `cat /proc/mdstat`. Ideally, you should integrate it into your monitoring, to alarm you if any raid is in degraded mode. Raid in degraded mode does not mean that you are still happy, it means that you should go to the server ASAP and replace it. Your gain is that your boss/customer/user/collegues do not experience downtime. If your backup is fine, then it is an acceptable compromise that you order (or initiate the ordering of) the new drive on the spot. – peterh Mar 05 '21 at 13:41
  • 2
    Oh man, this is a bad idea, RAID 5 is dead as a useful technology and has been for over a decade. I've been on this site since it started and every week or so we get someone who comes along asking us to help them with recovering their data from a damaged R5 array, each time we have to say that their only option is to restore from backup - it's seriously dangerous, especially with big disks. Really only R1/10 and R6/60 (and zraid if that's your thing) are to be trusted. If you don't believe me just do some googling about 'raid 5 dangerous' or similar. Also you should be backing up all the time. – Chopper3 Mar 05 '21 at 15:04
  • 1
    @Chopper3, sorry, but that answer is a bunch of hand waving nonsense. Except for the part about you still need backups. A 3 disk raid5 is only slightly more likely to fail than a 2 disk raid1, which is of course, because you have one more disk that could fail. A lot of people asking for help with their R5 array is something you would expect not because they have a high failure rate, but because they are in wide spread use. I've seen plenty of people asking for help with R1 too; that doesn't mean it's "dead as a useful technology". – psusi Mar 05 '21 at 19:46
  • @Chopper3 RAID 5 or any other RAID is not a backup solution. Its main purpose is not protection against data loss, but to allow the system to keep running when a disk breaks. **Of course** you still need backups. By the way, even if nothing breaks, a RAID doesn't help if you delete a file by accident. – berndbausch Mar 06 '21 at 06:43
  • RAID 10 on 6 Disk means that 4 disk can fault and i can maintain the system? – Van Gogh Mar 06 '21 at 09:29
  • @psusi Yes, but a 2 disk raid 1 gives you the capacity of a single disk on the price of 2. A 3 disk raid5 gives you 2 disks on the price of 3. And practical tasks have, in my experience, 2 types: in the first, we have an app with a little disk need (much smaller than 1T). In the second, simply no storage exists which would be enough for the app (disk need is 1T to infinity). I think raid5 is still a good idea for the latter. – peterh Mar 06 '21 at 13:59
  • https://www.zdnet.com/article/why-raid-5-stops-working-in-2009/ – Chopper3 Mar 07 '21 at 11:59
  • No, RAID10 on 6 disks can only be sure of handling 1 failure. It *might* handle 3, bit only if the right 3 fail. That is, assuming that you are only storing two copies of the data. Linux RAID10 can store more copies to handle more failures, at the expense of less usable space. – psusi Mar 08 '21 at 13:49
  • @Chopper3, I've had many drives over the last 20 years that have made it to 5 years without failing. Only once I had a drive start throwing SMART errors around 4 years and so I replaced it before it failed. I guess I'm a statistical outlier. That article grossly misunderstands the URE. According to that misunderstanding, if you were to read an entire multi TB disk several times, you would expect to get an URE. This is an easy experiment to do yourself. I have done so quite a few times and never had a URE. – psusi Mar 08 '21 at 14:05
  • Ok, you know best - ignore the dozens of people who've come here over the last decade+ with corrupted R5 data, they don't exist :) – Chopper3 Mar 09 '21 at 16:15

1 Answers1

2

What system is the RAID hosted into ? Most modern NASes (I've operated both Synology and QNAP) have built-in notification channels and strategies readily available and that contemplate a variety of events, disk failure being one of the most common. This all goes through the NAS' GUI. More in general (and including these NASes which are linux-based machines), one candidate to investigate disks status via command line is smartctl, which will give you detailed information about the S.M.A.R.T. status of the disks. You should find enough detailed information here (especially on how to install it if it's not already): https://www.smartmontools.org

Once you have it, you can first of all check if the disks support S.M.A.R.T. (again, most modern, mainstream disks do). This is one of my disks in my Synology box:

$ sudo smartctl -i /dev/sda
Password: 
smartctl 6.5 (build date Mar 30 2020) [x86_64-linux-3.10.105] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Iron Wolf PRO
Device Model:     ST4000NE001-2MA101
Serial Number:    XXXXXXX
LU WWN Device Id: 5 000c50 0cbe3d8cb
Firmware Version: EN01
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   Unknown(0x0fe0) (minor revision not indicated)
SATA Version is:  SATA >3.2 (0x1ff), 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Mar  5 14:56:21 2021 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

As you can see, S.M.A.R.T is available and enabled in this disk. If it's not enabled, smartctl can attempt to enable it (using the -s or --smart option). This ensured, you can quickly have a glance on the disk's status:

$ sudo smartctl -H /dev/sda
smartctl 6.5 (build date Mar 30 2020) [x86_64-linux-3.10.105] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

and that informs me that the disk is in good health. You can then use smartctl for more detailed analysis and to gather detailed information about each single S.M.A.R.T. indicator supported by your disks (not all disks support the very same set, even within the same manufacturer and within the same family of disks).

develox
  • 41
  • 4
  • 1
    OP is making probably linux sw raid. – peterh Mar 05 '21 at 13:44
  • That'd be my guess too. In this case smartctl would be of help for the disks, while your suggestion for mdstat is as good enough regarding the RAID volumes – develox Mar 05 '21 at 13:46
  • I tried with " sudo smartctl -i /dev/sda" and shows me onlny one disk information only 4TB but i have 4 x 4TB – Van Gogh Mar 05 '21 at 18:59
  • 1
    @VanGogh, yea... so check the other 3 disks as well. – psusi Mar 05 '21 at 19:48
  • How can i other 3 disk when i go with "sudo smartctl -i /dev/sda" it shows me only one – Van Gogh Mar 06 '21 at 09:28
  • You can use the same "fdisk -l" command you started with, it will give you the complete list of disks and partitions in your system. If your disk naming scheme starts with /dev/sda as I understood, then most probably you'll also have /dev/sdb , /dev/sdc , etc. But check with the fdisk command and of course concentrate your attention on disks, not partitions. You'll recognise these because the line starts with "Disk ...". On the contrary partitions are indicated by Device and have Start and End sectors indicated. – develox Mar 06 '21 at 13:16
  • Yes it shoed me /dev/sdb, /dev/sda, /dev/sdc , /dev/sdd . Each has 3.7 Tib. At the and there is"Disk /dev/md125: 14.5 TiB" . i use Raid 0 and if one faults the system got crashed. But on Raid 5, if one disk of 3 disk faults, i will see onle 3 disk, yes? – Van Gogh Mar 06 '21 at 13:43
  • Just repeat the above smartctl commands replacing each time /dev/sda with the other disks (/dev/sdb, /dev/sdc and /dev/sdd), and you'll know the status of every single disk. – develox Mar 06 '21 at 14:01
  • ok, it shows me following infos: https://i.ibb.co/Tc1vfSX/Screenshot-7.png And if the disk is damaged, waht will be output? nothing? – Van Gogh Mar 06 '21 at 14:08
  • As I wrote in my solution above, the command you gave tells only if S.M.A.R.T. is available and enabled (and so it seems). The real option that will tell you the disk health status is -H, see my example above. – develox Mar 06 '21 at 21:37