0

I have Linux (Crunchbang 8.10) set up with 3x1TB Hard drives set up as software RAID5.

It has recently suddenly stopped working.

cat /proc/mdstat shows the following:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : inactive sdd[2](S) sdc[1](S) sdb[0](S)
      2930287488 blocks

unused devices: 

mdadm --detail /dev/md0 shows:


mdadm: md device /dev/md0 does not appear to be active.

I Have tried running sudo mdadm -A /dev/md0 but get:

mdadm: /dev/md0 not identified in config file.

My /etc/mdadm/mdadm.conf shows:

# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
DEVICE partitions

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST 

# instruct the monitoring daemon where to send mail alerts
MAILADDR root

# definitions of existing MD arrays

# This file was auto-generated on Thu, 21 May 2009 18:32:49 +0100
# by mkconf $Id$

Has my config been corrupted? Please help.

iali
  • 103
  • 1
  • 3

1 Answers1

1

Looks like your drives are all being reported as [S]pares. You should check your logs (dmesg, /var/log/messages) to see if there's any indication why this happened.

Try running the following

sudo mdadm --examine --scan --config=/etc/mdadm/mdadm.conf

And see the output. If it outputs something like this:

ARRAY /dev/md0 level=raid5 metadata=1 num-devices=3 UUID=22f22c3599:613d5231:d407a655:bdeb84 name=backup:1

Then you can append it to the bottom of the mdadm.conf:

sudo mdadm --examine --scan --config=/etc/mdadm/mdadm.conf >> /etc/mdadm/mdadm.conf

Then try starting the array:

sudo mdadm -A /dev/md0

Good luck.

vmfarms
  • 3,117
  • 20
  • 17
  • Thanks for the fast response. mdadm --examine --scan --config=/etc/mdadm/mdadm.conf Gives me no output. And dmesg, spouts a lot but not an advanced enough user to locate problem areas.. – iali Aug 16 '10 at 01:50
  • Ah. I had to use sudo. I get the following: ARRAY /dev/md0 level=raid5 num-devices=3 UUID=adc48f8c:fc8a367a:05731609:dddb53d2 No metadata=1 though. I get permission denied when I try to use your append command. Though I am using sudo command – iali Aug 16 '10 at 01:54
  • Yes, you need to use sudo for all commands. I edit the post to reflect that. Try running the append command again with sudo. If it doesn't work, edit the mdadm.conf file and paste the contents manually. Then run the last command to assemble the array. – vmfarms Aug 16 '10 at 01:59
  • Update: I have managed to append the line. I now get "no devices found for /dev/md0" when I try mdadm -A /dev/md0 – iali Aug 16 '10 at 02:00
  • I ran "sudo mdadm --stop /dev/md0" then "sudo mdadm -A /dev/md0" I get the following message: "mdadm: /dev/md0 assembled from 1 drive - not enough to start the array" – iali Aug 16 '10 at 02:07
  • You could try forcing the assembly: mdadm -vv -Af /dev/md0. If that doesn't work you could try and recreate the array using the following: sudo mdadm --create /dev/md0 --level=5 --raid-devices=3 /dev/sdb /dev/sdc /dev/sdb , if those are your devices. – vmfarms Aug 16 '10 at 02:23
  • When I try forcing the message I get is "-A would set mdadm mode to assemble, but it is already set to misc". I am about to try recreating but wouldn't using create wipe the data from the drives? – iali Aug 16 '10 at 02:29
  • My bad, mdadm is very picky about the order of the arguments. Try this one: sudo mdadm --assemble /dev/md0 --force – vmfarms Aug 16 '10 at 02:31
  • Thanks. I now get this: mdadm: forcing event count in /dev/sdb(0) from 587449 upto 587452 Segmentation fault – iali Aug 16 '10 at 02:35
  • Hmmm.. seems like your version of mdadm is suffering from a known bug (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=499643). You can try compiling a newer version of mdadm and run the command again, or if you're feeling bold, you can try to recreate the array with the commands I supplied earlier. Use at your own risk though. I've had success with recreating in the past. – vmfarms Aug 16 '10 at 02:41
  • Thanks I'll try that. You wouldn't happen to know of any useful guides in compiling and installing the latest version of MDADM on (Crunchbang) Ubuntu 8.10 would you? – iali Aug 16 '10 at 02:55
  • I used Ubuntu10.04 LIVE CD, started it up, installed mdadm and tried the commands again. I am getting the following messages: "forcing event count in /dev/sdb(0) from 587449 upto 587452" and "failed to RUN_ARRAY /dev/md0: Input/output error" – iali Aug 16 '10 at 13:53
  • Hmm.. Does mdadm --detail /dev/md0 give you anything after you ran that? – vmfarms Aug 16 '10 at 14:02
  • Yes. It's saying Raid devices: 3, Total devices: 2, Working devices: 2, Failed devices: 0. One of the disks: /dev/sdc has been removed – iali Aug 16 '10 at 14:07
  • Oh yeah, the state is 'active, degraded, Not Started'. I've read another post where the guy used 'echo "clean" > /sys/block/md0/md/array_state'. Will using clean, wipe the data? – iali Aug 16 '10 at 14:11
  • Well it's good that the array is active now with 2 of the 3 devices. RAID-5 can work that way. What you should do is see if you can start it now, and if successful, add the remaining drive. Try: mdadm -R /dev/md0 – vmfarms Aug 16 '10 at 14:22
  • Still saying 'mdadm:failed to run array /dev/md0: input/output error' – iali Aug 16 '10 at 14:25
  • Can you paste the output of cat /proc/mdstat? – vmfarms Aug 16 '10 at 14:27
  • You can try that echo command: echo "clean" > /sys/block/md0/md/array_state . This will reset the state of your array. If there's a real problem, mdadm will complain again and refuse to start. Give it a shot. – vmfarms Aug 16 '10 at 14:37
  • Also, make sure you readd your missing drive: mdadm --manage /dev/md0 --re-add /dev/sdc – vmfarms Aug 16 '10 at 14:38
  • Brilliant! that worked!! Thank you soooooo much. Do I need to do anything else in order to keep it up? – iali Aug 16 '10 at 14:45
  • Sorry another thing. Not all of the data appears to be there. Does that mean that data is lost/corrupted? – iali Aug 16 '10 at 14:46
  • Oh I see. I think its rebuilding. Sorry being stupid – iali Aug 16 '10 at 14:51
  • No problem! Glad it all worked out. You should probably setup mdadm to email you when something like this happens again (http://www.novell.com/support/search.do?cmd=displayKC&sliceId=SAL_Public&externalId=7001034). You should also install smartmontools to alert you when it detects potential drive failures. BTW, if this solved your problem, make sure you mark it as so :D – vmfarms Aug 16 '10 at 15:01
  • First time using this forum, didn't notice the tick thing. Thanks again – iali Aug 16 '10 at 15:13