bit of background: I have an esxi 5.5 cluster with vcenter HA. I have multiple iscsi LUNs which are hosted on Ubuntu running iscsi target and software RAID (mdadm).
A few days ago I noticed a bunch of vm's were inaccessible. I removed them from inventory thinking I'd add them back by browsing the datastore.
The datastore was showing inactive. The other datastores (same server) were fine.
rescan/refresh didnt work. I removed from inventory all the vm's hosted on the datastore with the problem but wasnt able to remove it still.
"HostDatastoreSystem.RemoveDatastore" for object on vCenter Server .
on the esxi hosts I ran /etc/init.d/storageRM stop then rescanned and restarted storageRM. This got rid of the datastore from vcenter console. Tried to remove and add it back from the iscsi adapter, this was fine. But when I try to add it as a datastore under configuration/storage I get another error - unable to read the partition information for device.
Its VMFS5, mirrored RAID1. 4tb.
I've logged onto the esxi shell directly on one of the hosts and used partedUtil to investigate and try to repair it.
getting the following if I try to getUsableSectors or getptbl
Error: The primary GPT table states that the backup GPT is located beyond the end of disk. This may happen if the disk has shrunk or partition table is corrupted. Fix, by writing backup table at the end? This will also fix the last usable sector appropriately as per the new reduced size. diskPath (/dev/disks/t10.94544500000000002318F588822755821C9CFF1605288097) diskSize (7813774720) AlternateLBA (23441323007) LastUsableLBA (23441322974) Warning: The available space to /dev/disks/t10.94544500000000002318F588822755821C9CFF1605288097 appears to have shrunk. This may happen if the disk size has reduced. The space has been reduced by (15627548288 blocks). You can fix the GPT to correct the available space or continue with the current settings ? This will also move the backup table at the end if it is not at the end already. diskSize (7813774720) AlternateLBA (23441323007) LastUsableLBA (23441322974) NewLastUsableLBA (7813774686) Error: Can't have a partition outside the disk! Unable to read partition table for device /vmfs/devices/disks/t10.94544500000000002318F588822755821C9CFF1605288097
trying to fix it:
partedUtil fixGpt /vmfs/devices/disks/t10.94544500000000002318F588822755821C9CFF1605288097
FixGpt tries to fix any problems detected in GPT table. Please ensure that you don't run this on any RDM (Raw Device Mapping) disk. Are you sure you want to continue (Y/N): y Error: The primary GPT table states that the backup GPT is located beyond the end of disk. This may happen if the disk has shrunk or partition table is corrupted. Fix, by writing backup table at the end? This will also fix the last usable sector appropriately as per the new reduced size. diskPath (/dev/disks/t10.94544500000000002318F588822755821C9CFF1605288097) diskSize (7813774720) AlternateLBA (23441323007) LastUsableLBA (23441322974) Fix/Ignore/Cancel? fix Error: Can't have a partition outside the disk! Unable to read partition table on device /vmfs/devices/disks/t10.94544500000000002318F588822755821C9CFF1605288097
One of the other datastores is identical with identical disks so I tried to setptbl using the size from that.
partedUtil setptbl /vmfs/devices/disks/t10.94544500000000002318F588822755821C9CFF1605288097 gpt "1 2048 7813774686 AA31E02A400F11DB9590000C2911D1B8 0" gpt 0 0 0 0 1 2048 7813774686 AA31E02A400F11DB9590000C2911D1B8 0 Error: The primary GPT table states that the backup GPT is located beyond the end of disk. This may happen if the disk has shrunk or partition table is corrupted. Fix, by writing backup table at the end? This will also fix the last usable sector appropriately as per the new reduced size. diskPath (/dev/disks/t10.94544500000000002318F588822755821C9CFF1605288097) diskSize (7813774720) AlternateLBA (23441323007) LastUsableLBA (23441322974) Warning: The available space to /dev/disks/t10.94544500000000002318F588822755821C9CFF1605288097 appears to have shrunk. This may happen if the disk size has reduced. The space has been reduced by (15627548288 blocks). You can fix the GPT to correct the available space or continue with the current settings ? This will also move the backup table at the end if it is not at the end already. diskSize (7813774720) AlternateLBA (23441323007) LastUsableLBA (23441322974) NewLastUsableLBA (7813774686) Error: Can't have a partition outside the disk!
On the iscsitarget host the LUNs show healthy. mdstat also shows healthy RAID and disks.
Is there anything else I can try to repair this and recover the vm's?
Thanks for helping.