VMWare - VM won't start - 5 (Input/output error) - failed drive

Question

Basics: Server with 4 drives, 2 solid state. One of the solid state drives appears to have failed. Running VMWare 5.0

We tried to distribute the VMs over several disks and using RAID, but I'm not sure if it was setup incorrectly. We tried to ensure that if one of the disks ever failed, we would still be ok. However, it may have had opposite effect. Here is the startup error:

Failed to start the virtual machine.
Module DiskEarly power on failed. 
Cannot open the disk '/vmfs/volumes/54d9758a-23d4381c-9118-40167e7bd317/atlassian.somedomain.com/atlassian.somedomain.com_9-000003.vmdk' 
or one of the snapshot disks it depends on. 
5 (Input/output error)

under properties for the VM, I can see: Shows Disabled Drive in Settings

Here are the drives when SSH'ing into the VMWare server: Shows list of available drives

Here are the contents of HDD1: Contents of HDD1's folder

Contents of HDD2: Contents of HDD2's folders

Contents of SSD1: Contents of SSD1's folder

Finally, When I look at SSD1s’ atlassian.somedomain.com.vmx file, I can see:

Contents of VMX file

Note the reference to SSD2 (54d9758a-23d4381c-9118-40167e7bd317) looking for atlassian.somedomain.com_9-000003.vmdk

What’s strange is that some of the other VMs don’t have the same problem, even if they do share files on that same failed drive.

I'm not sure how to proceed, and before I make a 'final' error, I wanted to get feedback on next steps.

I could:

1) Delete the affected Hard Disk from the VM's Hardware list: delete drive

2) Alter SSD1s’ atlassian.somedomain.com.vmx file, instead pointing to version _8 (instead of the missing 9)

3) Any other suggestions?

NOTE: The purple you see in the images is my covering up of the actual domain name.

EDIT: Note that I understand I may end up losing _10, _11 if they are all interdependent - as I may have to move all back to _8. If need be, so be it. I just need to get as much recovered as possible.

VMware 5?! You know it's end of life right? Maybe it's time for an upgrade... — GregL, Jan 16 '18 at 12:27
Please do NOT put screenshots in your question. Copy and paste **the console text** and insert as quote. — Zac67, Jan 16 '18 at 18:58

score 0 · Answer 1 · answered Jan 16 '18 at 19:08

Distributing VM files over several disks ensures that you can't start the VM unless all disks are online.

RAID is essential in a serious setup. If you can live with some extended downtime a good backup may be a substitute for a RAID setup, otherwise you need both. Also make sure RAID and backup work correctly.

Rebuild the storage with a solid RAID setup. Restore from backup. That's it.

These may be harsh words but it's the truth, sorry.

One hint though: in case you need to recover as much as possible, have a mounted VMFS volume but an unreadable .vmdk, you can use dd conv=noerrors to work around the read errors and copy the vmdk with some 'holes' in it. Don't expect the VM to work afterwards, it just gives you somewhat more to work with.

VMWare - VM won't start - 5 (Input/output error) - failed drive

1 Answers1