0

I recently came home from vacation only to wake up the following morgon with two disks in my RAID5 array marked as faulty. I was able to start the RAID by forcing it to use the faulty disks and run, and I were able to salvage some of the important data I had on it.

For some reason the server was acting up, and would not let me execute commands under sudo or su, so I decided to reboot. Once I did the drives changed names, so now I don't know which ones were the original faulty ones.

Now I'm trying to determine which disks are bad so that I can replace them, but it's not turning out to be very easy. All of the disks are still functioning, but there are at least two that I'm certain I do not want in my new RAID. I was hoping that you could all try to help me out here.

I've ran badblocks on all the disks which didn't throw any errors, but smartctl did give me some interesting information on the disks, but I don't really know what to make of it.

Here are the log files, updated as recently as today.

The newest tests are tagged under LATEST with the date (03/11/2014).

I can only post two links, so here are the other ones...

/dev/sdd: http://paste.ubuntu.com/8808126/
/dev/sde: http://paste.ubuntu.com/8808128/

Thanks in advance

Jonathan

  • Just a guess, but might the two faulty disks be the ones which were marked as faulty? – Michael Hampton Nov 03 '14 at 21:33
  • @MichaelHampton, yeah, obviously. The issue is that I could not check which disks they were as the server was acting strange. Neither sudo or su would work, so I had to reboot - and when I did they changed names. – Jonathan Lundström Nov 03 '14 at 21:35
  • I don't know if you actually _read_ those pastes, but two of those disks reported errors, and two did not. – Michael Hampton Nov 03 '14 at 21:45
  • @MichaelHampton Of course I read the pastes, but I also want to make sure that there is nothing wrong with the other disks as well. I'm not an expert at this. Everyone I ask tells me the opposite of what the previous ones told me... – Jonathan Lundström Nov 03 '14 at 21:54

1 Answers1

0

/dev/sda and /dev/sdd are failing. Drives that fail the extended self-tests are basically bound to fail at any time. Self-tests are very light on the drives, so a failure there signals that you should replace the drives immediately.

Also, it looks like the drives have overheated in the past, so I would advise checking your server's airflow.

Nathan C
  • 15,059
  • 4
  • 43
  • 62
  • Thanks Nathan, then it is as I expected. I have replacement disks ready so I'll swap them out in the morning. The server is usually around 40°C, but I'll make sure to plug in the rest of the fans to get it down a bit more. – Jonathan Lundström Nov 04 '14 at 03:20