2

Last night I received an e-mail from mdadm about the possible failure of two drives on my array. The raid array was set up as a 4 2TB drive raid5 with one hot spare. Is this system truly fried? Did the hot spare pick up anything at all, or did the two drives fail at once? Did one drive fail, start to rebuild onto the spare, and then cause another drive failure? I'm fairly new to working with raids, and this system is one I inherited from a previous employee, so I'm unsure of what the proper troubleshooting steps are here. Any help would be much appreciated.

Output of cat /proc/mdstat:

sudo cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10] 
md0 : active raid5 sdc[4](F) sdd[5](F) sda[6](S) sdb[0] sde[3]
      5860543488 blocks level 5, 64k chunk, algorithm 2 [4/2] [U__U]

Output of mdadm --detail:

#sudo mdadm --detail /dev/md0

/dev/md0:
        Version : 0.90
  Creation Time : Mon Jun 21 13:54:13 2010
     Raid Level : raid5
     Array Size : 5860543488 (5589.05 GiB 6001.20 GB)
  Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
   Raid Devices : 4
  Total Devices : 5
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Mon Apr 29 10:52:27 2013
          State : clean, FAILED
 Active Devices : 2
Working Devices : 3
 Failed Devices : 2
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : 2874db80:a0f02d66:999df3c7:ff8f8e6e (local to host bigkahuna)
         Events : 0.10984

    Number   Major   Minor   RaidDevice State
       0       8       16        0      active sync   /dev/sdb
       1       0        0        1      removed
       2       0        0        2      removed
       3       8       64        3      active sync   /dev/sde

       4       8       32        -      faulty spare   /dev/sdc
       5       8       48        -      faulty spare   /dev/sdd
       6       8        0        -      spare   /dev/sda
Zielak
  • 23
  • 2
  • 3
    Try to shutdown and disconnect your server from power for a few minutes. I've seen drives which would resurrect after cutting power completely - maybe because drive firmware got reset etc. If not then restoring from backups would be rather unavoidable. – Tometzky Apr 29 '13 at 18:36

1 Answers1

1

If there are no smartctl errors on the drives in logs or in dmesg. You can try to reassemble the RAID:

mdadm --assemble /dev/md0 --scan --force
Danila Ladner
  • 5,331
  • 22
  • 31
  • 1
    sda, sdb, and sde all returned normal looking smartctl logs. When I ran smartctl on sdc and sdd (the failed drives) I got the following error message, even with "permissive" set: >> Terminate command early due to bad response to IEC mode page Error Counter logging not supported Device does not support Self Test logging Would forcing it to assemble be an option? Would it do any good? – Zielak Apr 29 '13 at 18:07
  • What is the message? – Danila Ladner Apr 29 '13 at 18:08
  • Sorry, edited to include the error message in the comment. – Zielak Apr 29 '13 at 18:09
  • so with "smartctl --all /dev/sdc -T permissive" you still get this: "bad response to IEC mode page"? – Danila Ladner Apr 29 '13 at 18:23
  • Correct. Along with Error Counter logging not supported Device does not support Self Test logging – Zielak Apr 29 '13 at 18:33
  • Hmmm. That is weird. Usually for this is the case when smart controller is broken or your disk is completely dead. -((( – Danila Ladner Apr 29 '13 at 19:02
  • Oddly enough, mdadm --assemble /dev/md0 --scan --force Plus a reboot seemed to restore the superblocks on sdc and sdd. I know know why they vanished yet and why those drives dropped out. I'll be checking to see if the drives are healthy. – Zielak Apr 29 '13 at 21:30
  • 1
    Yeah, it is very weird. But usually it is very hard to kill soft raid, specifically mdadm, unless disks went bad completely. But I would advise watch the progress carefully and when possible replace those drives. I assume those 2TB you have are not really enterprise level drives either. – Danila Ladner Apr 29 '13 at 21:38