I have a ESXi server containing a large virtual disk (1.8TB) on RAID 0, and the server suffered from a power outage.
The drive was a storage drive for a Windows Server 2012 R2 file server, which is no longer accessible from windows, and is now behaving very strangely in a way I haven't seen before.
The RAID is fine, disks are fine, and VMDK file seems well. I can also access all files without issues through a linux live iso. And linux tools such as ntfsfix
, fsck
, etc all say the drive is good.
However, this is the weird bit, Windows has some major issues with it. I can not boot Windows, or WinPE with the drive, it will not start. I have to hot add the drive after I get into Windows. chkdsk /f
will sometimes report the partition is NTFS and then hang without further output for hours. Otherwise it will hang without any output at all.
It seems any IO operations on the drive cause the process to hang. Refreshing Disk Managment causes DM to stop responding. diskpart
will print version info and computer name, and then hang.
Looking into the event log, I see little traces that it's attempting to do something in the background. Such as Event ID 153
, with a description of:
The IO operation at logical block address 0x2aa72bd8 for Disk 1 (PDO name: \Device\00000033) was retired.
With maybe one log entry like above every 20-30 minutes.
Unfortunately migrating the files to another drive through Linux is not possible at the moment, nor is replacing the disks, which are reporting good health.
Questions:
- I'm assuming there is some sort of disk checking going on when the drive gets put online, but I'm not seeing any type of progress, is there a log source, log file or something where I can see what's going on?
- If the above is true, without logs, is there a way to abort the check and check manually with
chkdsk
? - Any other suggestions?