0

During boot, I got a message saying an md RAID array was degraded. My first reaction was to reboot. Everything appears to be working just fine now. All disks are active.

What concerns me is that one of the disks is plodding along on a thin lifeline. How can I diagnose which disk it is that temporarily failed? Can I run some tests to see if it needs to be replaced? It's still under warranty, so if I do replace it I'd like to be able to claim it really is near death.

MadHatter
  • 79,770
  • 20
  • 184
  • 232
moinudin
  • 205
  • 1
  • 3
  • 7

1 Answers1

3

Fisrtly, the mdadm warning you got should show you which disc(s) were degraded. Could you cut-and-paste that into your question, along with your current /proc/mdstat output?

Secondly, write errors to an md device should be logged under syslog. Can you find anything with grep sda /var/log/messages? You will probably need to run through sd[a-f] and hd[a-d] in the grep to be sure to catch all likely discs.

Thirdly, smartctl -a /dev/sda should give you health check information on /dev/sda, and similarly for your other HDDs, if they're SMART-aware, as most modern drives are.

If you can't get something out of that, it's probably not very failing!

Later edit: Marcog, sorry, my bad, I missed the "during boot" bit about your warning. I really should read more carefully. I agree with you that it sounds like a disc not being correctly detected. I do recommend the smartctl route, though; it can be used with -t to force one of a suite of tests (see the man page), many of which are suitable for use on a live disc (contains data and is mounted). I do these every few months just to try to avoid unpremeditated failures

MadHatter
  • 79,770
  • 20
  • 184
  • 232
  • `/proc/mdstat` gave nothing, but I'm running a check now (see @poige's answer) I'll post it once that's finished. `/var/log/messages` has nothing, because no partition was mounted because of this issue so it couldn't write to disk. `smartctl` doesn't give anything. It sounds to me like this was a disk that wasn't detected at power-up or something, so hopefully just a false alarm. Can't be too cautious though. – moinudin Feb 26 '11 at 13:22