1

Today I had to fix 3 VMs with near identical file system errors all running Redhat 5 or 6. Basically the VMs had corrupted inodes and remounted / as a read-only file system, also corrupting several mysql tables and several mounted drives. Running fsck -A -y on each server solved the problems as well as manually repairing the corrupted MySQL tables.

I wasn't present when the corruption happened, but my understanding is that the corruption was simultaneous, whether or not is was a service crashing I don't know. I can't find anything in dmesg, messages, audit.log, mysqld.log to indicate the underlying cause. I'm not an expert server admin especially in Redhat (Ubuntu) but any sort of ideas or guidance to troubleshooting would be very welcome.

Underlying architecture is VMWare hypervisor.

I've tentatively suggested that a possible cause for all 3 servers to experience corruption could be because of an underlying fault with the data storage device.

Anyone had experience with something similar?

Thanks.

  • 1
    If the drives are virtual, possibly on a SAN or local RAID array, can you do a read-only integrity scan of the RAID array or SAN? It sounds like maybe the RAID controller picked a silently failed disk to read from at the wrong time. – Slartibartfast Apr 03 '14 at 03:28
  • Possibly - will know more when the provider comes in and checks the disks, but found another redhat server that had another failure. – Justin Mitchell Apr 03 '14 at 04:51

1 Answers1

1

This is most likely a fault of the underlying VMware system (storage), and NOT the fault of the individual VM's. You should speak with your VMware team about this, they can probably diagnose further.

MichelZ
  • 11,068
  • 4
  • 32
  • 59
  • Thanks for that, looks to be the case. Looks like all of the Redhat 5/6 servers were affected by something and running file system check sorted them out. Ubuntu & FreeBSD were unaffected. – Justin Mitchell Apr 03 '14 at 06:22