0

bit of background: I have an esxi 5.5 cluster with vcenter HA. I have multiple iscsi LUNs which are hosted on Ubuntu running iscsi target and software RAID (mdadm).

A few days ago I noticed a bunch of vm's were inaccessible. I removed them from inventory thinking I'd add them back by browsing the datastore.

The datastore was showing inactive. The other datastores (same server) were fine. rescan/refresh didnt work. I removed from inventory all the vm's hosted on the datastore with the problem but wasnt able to remove it still.
"HostDatastoreSystem.RemoveDatastore" for object on vCenter Server .

on the esxi hosts I ran /etc/init.d/storageRM stop then rescanned and restarted storageRM. This got rid of the datastore from vcenter console. Tried to remove and add it back from the iscsi adapter, this was fine. But when I try to add it as a datastore under configuration/storage I get another error - unable to read the partition information for device.

Its VMFS5, mirrored RAID1. 4tb.

I've logged onto the esxi shell directly on one of the hosts and used partedUtil to investigate and try to repair it.

getting the following if I try to getUsableSectors or getptbl

Error: The primary GPT table states that the backup GPT is located beyond the end of disk. This may happen if the disk has shrunk or partition table is corrupted. Fix, by writing backup table at the end? This will also fix the last usable sector appropriately as per the new reduced size. diskPath (/dev/disks/t10.94544500000000002318F588822755821C9CFF1605288097) diskSize (7813774720) AlternateLBA (23441323007) LastUsableLBA (23441322974) Warning: The available space to /dev/disks/t10.94544500000000002318F588822755821C9CFF1605288097 appears to have shrunk. This may happen if the disk size has reduced. The space has been reduced by (15627548288 blocks). You can fix the GPT to correct the available space or continue with the current settings ? This will also move the backup table at the end if it is not at the end already. diskSize (7813774720) AlternateLBA (23441323007) LastUsableLBA (23441322974) NewLastUsableLBA (7813774686) Error: Can't have a partition outside the disk! Unable to read partition table for device /vmfs/devices/disks/t10.94544500000000002318F588822755821C9CFF1605288097

trying to fix it:

partedUtil fixGpt /vmfs/devices/disks/t10.94544500000000002318F588822755821C9CFF1605288097

FixGpt tries to fix any problems detected in GPT table. Please ensure that you don't run this on any RDM (Raw Device Mapping) disk. Are you sure you want to continue (Y/N): y Error: The primary GPT table states that the backup GPT is located beyond the end of disk. This may happen if the disk has shrunk or partition table is corrupted. Fix, by writing backup table at the end? This will also fix the last usable sector appropriately as per the new reduced size. diskPath (/dev/disks/t10.94544500000000002318F588822755821C9CFF1605288097) diskSize (7813774720) AlternateLBA (23441323007) LastUsableLBA (23441322974) Fix/Ignore/Cancel? fix Error: Can't have a partition outside the disk! Unable to read partition table on device /vmfs/devices/disks/t10.94544500000000002318F588822755821C9CFF1605288097

One of the other datastores is identical with identical disks so I tried to setptbl using the size from that.

partedUtil setptbl /vmfs/devices/disks/t10.94544500000000002318F588822755821C9CFF1605288097 gpt "1 2048 7813774686 AA31E02A400F11DB9590000C2911D1B8 0" gpt 0 0 0 0 1 2048 7813774686 AA31E02A400F11DB9590000C2911D1B8 0 Error: The primary GPT table states that the backup GPT is located beyond the end of disk. This may happen if the disk has shrunk or partition table is corrupted. Fix, by writing backup table at the end? This will also fix the last usable sector appropriately as per the new reduced size. diskPath (/dev/disks/t10.94544500000000002318F588822755821C9CFF1605288097) diskSize (7813774720) AlternateLBA (23441323007) LastUsableLBA (23441322974) Warning: The available space to /dev/disks/t10.94544500000000002318F588822755821C9CFF1605288097 appears to have shrunk. This may happen if the disk size has reduced. The space has been reduced by (15627548288 blocks). You can fix the GPT to correct the available space or continue with the current settings ? This will also move the backup table at the end if it is not at the end already. diskSize (7813774720) AlternateLBA (23441323007) LastUsableLBA (23441322974) NewLastUsableLBA (7813774686) Error: Can't have a partition outside the disk!

On the iscsitarget host the LUNs show healthy. mdstat also shows healthy RAID and disks.

Is there anything else I can try to repair this and recover the vm's?

Thanks for helping.

infidel
  • 13
  • 4
  • Have you run a "check" on the array via `echo check > /sys/block/mdX/md/sync_action`? This entry on the kernel wiki has more information: https://raid.wiki.kernel.org/index.php/RAID_Administration – Mike Andrews Jan 12 '17 at 19:24
  • done a check and tried, same problem. Tried a repair and same issue. – infidel Jan 15 '17 at 18:26
  • I used gdisk and sgdisk to fix the gpt partition and copy the partition table from an identical raid set. This worked and got me to the point where the storage is recognised in esxi. However, it looks like the vmfs5 file system is gone. esxi wants to create a file system when attaching the storage. I used linux vmfs-tools to try to recover any data but get an error about missing lvm magic so it cant be mounted. I think my only recourse at this point is to use data recovery software to see if any of the original data can be salvaged. – infidel Jan 16 '17 at 13:23
  • Yeah, that's too bad. I don't know how you'd prove it, but it sounds like quite a bit of the beginning of that device was overwritten, perhaps by human error. It could be that permissions weren't set up quite right and some other machine attached those LUNs. Or, it could even have been some operator mistake on the machine that's hosting the storage. Little consolation now, but remember, RAID isn't a backup: http://serverfault.com/questions/2888/why-is-raid-not-a-backup . Good luck! – Mike Andrews Jan 16 '17 at 15:32

0 Answers0