1

We have a server which we inherited from previous sysadmin. Now the RAID-5 (H/W based on IBM ServeRAID) now unusable, saying not enough disks to operate.

On ServeRAID Manager I can see three physical disks, and one logical volume. The logical volume properties says: Data space: 698GB and Parity space: 232GB, from which I guess there should be four disks.

Am I right?

In the server there are three disks physically. (Each are 250GB disk.)
So now I should be sure the logical volume was built up originally from thee or four disks? How can be sure in it? I was unsure on proper meaning of Data space and Parity space in this terminology, if I have to sum them, it means four disks should have been there.

TheCleaner
  • 32,627
  • 26
  • 132
  • 191
netmano
  • 269
  • 2
  • 12
  • 4
    Your logic seems reasonable to me, but where are you going to go from there? The controller says the RAID is non-functional, so if you need the data you'll have to restore from backups, and if you're going to zero out the array why do you care how many HDDs it used to comprise? – MadHatter Feb 17 '15 at 12:29
  • Well, two goals: I need evidence to report what happened, and get the consequences. Second, if it was three disks we can give a try of recovery software. – netmano Feb 17 '15 at 12:43
  • 1
    If it was three discs, the software probably wouldn't be saying "not enough discs to operate". – MadHatter Feb 17 '15 at 12:52
  • Here's what I don't get. While your math seems to be on, if it were a RAID-5 array with 4 drives and one is now missing, the array would have been set by ServeRAID as "degraded" but would still function with that single drive missing, just without parity. Another drive failure would cause total failure, but I'm surprised that you can't get the logical drive to boot or see the data on that drive if the existing 3 drives are online and part of the original array before you got it. – TheCleaner Feb 17 '15 at 14:03
  • @TheCleaner Now two drives says: OK, and one says: Rebuilding. But seems do nothing. According manual Rebuilding will start itself if hot-spare is ready. There were no hot spare, and we did not fire a command to rebuild. However the ServeRAID manager states this. Our assumption is system was running degraded, and now another disk made errors, and now faulted. We have not noticed as Windows Server was running in ESXi, so WinServ didn't even know it runs on ServeRAID, so no errors come up. – netmano Feb 17 '15 at 14:21
  • Well, the ESXi hardware monitor would've said if a drive failed, but that's neither here nor there. If you are just hoping to get the data, you could try booting into ServeRAID and marking all disks as "ONLINE" and hoping that the server boots by tricking it. I've done this in the past with an xSeries server but I didn't try booting it, but instead used a Linux boot disk to then access the logical drive and move data off. – TheCleaner Feb 17 '15 at 14:27
  • @TheCleaner Now we will try to clone the disks one-by-one and investigate those images by some Raid Recovery tools. The server is in two hours driving distance, so until my colleague arrive back, I just preparing. I am not brave enough to try force online without having a copy at least :) – netmano Feb 17 '15 at 14:30
  • Sure. And if you have IBM hardware support on this, you can contact them as well. They have some internal tools that can help. I've had them assist in the past on a server that didn't have a good backup and 2 of 3 disks had failed. This was years and years ago, so maybe they've gotten even better tools now. – TheCleaner Feb 17 '15 at 14:41
  • Thanks for all! We figured out it was three disks, one of them had bad sectors. All data was recovered from two of the three of RAID-5. It is very consfusing what IBM ServeRAID manager stated about data space and parity space, and not providing exact info about how many disks should be. Btw, we used reclaime.com to restore the image which was a VMFS, which we extract by http://glandium.org/projects/vmfs-tools/, then VMDK was mounted on Windows by official vmware software. – netmano Mar 14 '15 at 21:04

1 Answers1

0

Just a guess shot, but after reading the question and 8 comments this seems possible... One of the disks have failed, but removed was not that one. Possibly the admin's mistake. If you can get the missing one there's some hope.

Does ServeRAID use DDF metadata? If so, it may be worth trying Linux mdadm (ver. 3.0+) to re-assemble if hardware controller refuses to.

sam_pan_mariusz
  • 2,133
  • 1
  • 14
  • 15