0

I have a new PowerEdge T320:

  • Intel Xeon E5-1410 @ 2.8 GHz
  • 16GB
  • Windows Server 2008 r2 configured as Domain Controller
  • Dell PERC S110 Software Raid
  • 4x 7200 rpm 1TB drives configured in Raid5

I have determined that I have a faulty drive (for more details, see this other post). I'm not sure which drive it is, so I would like to shutdown my server, pull a drive out, test the drive with a usb to sata connector, and slide back in. However, I do not under any circumstances want to rebuild a good drive.

So, if the server is completely powered off, will the raid controller (note: mine is a software raid) recognize that the drive has been pulled and try to rebuild it on restart? Or, can I pull each one and test them, then put them all back in and boot normally?

Alternatively, is their a bootable utility that will give me hdtune style read/write tests?

Blackjack00
  • 333
  • 1
  • 5
  • 12
  • Reiterating my comment below, the answer was correct and I was able to test. While this did not fix my problem, I was able to get the entire system replaced. – Blackjack00 Mar 15 '13 at 03:51

3 Answers3

1

RAID controllers aren't capable of telling if a drive (or a whole array) has been pulled while it's powered off... unless the drive is still missing or in a different slot when it's powered back on.

So, your plan will work, provided you put the drives back in to them same slots and don't power the server back on until all the drives are back in place. You'll want to label the drives with their position before you pull any of them out, however.

HopelessN00b
  • 53,795
  • 33
  • 135
  • 209
  • 2
    I'm not sure how you will be able to verify that the drive is "good" by just connecting it to your computer via a USB to SATA connector. Since the drive was part of an array, you will want to be super careful that you don't do anything to destroy the partition on the drive when doing your testing. – Harold Wong Mar 01 '13 at 20:21
  • @HaroldWong, one way you could tell it was good would be to run a non-destructive badblocks scan. I would want to actually connect it via SATA though so I could get SMART data, since most USB controllers don't seem to pass that. – Zoredache Mar 01 '13 at 20:25
  • Thank you very much for the advice. @HaroldWong, the symptom is an extremely slow read/write time ( < 5Mbps). I am hoping the drive with the issue will be obviously slower when testing individually. – Blackjack00 Mar 01 '13 at 21:19
  • 1
    After having completed testing, I determined that the problem was not the drives. I was left with a cabling issue (which would be very strange) or a software problem associated with the software raid. Thankfully, Dell finally agreed to replace the entire box and we had them put a hardware raid controller in it. Lesson learned: do not use software raid for any reason, ever, ever. – Blackjack00 Mar 15 '13 at 03:50
1

You should be able to pull status on the drives in the RAID, and see which one is showing as Defunct or Predictive Failure Alert (PFA), from within your RAID management console.

Slow access usually means that one drive has already failed, and that the RAID management is having to rebuild data from parity information, or otherwise verify that it's OK. This should be shown as an alert within the RAID management; you should not have to pull each drive to test them because the 'bad' drive will be flagged with an icon.

Now, if you're using a RAID controller but have no management console installed... then you might run into difficulties... :)

Since you're running Dell PowerEdge Software RAID, Dell's OpenManage™ Storage Services would be recommended for RAID management and monitoring. If you don't have it installed, I'd recommend that you procure a copy and use it.

George Erhard
  • 814
  • 6
  • 12
0

Yes you can remove the drives in this way. I won't recommend this process, but the RAID controller will not initiate a rebuild based on drives having been removed while the system is offline.