1

We are seeing instances of a VM hanging with the following symptoms:

  • load average 800, processes stuck, CPU 100% in iowait
  • reading files work, writing files hang the system
  • RAM utilization is high, but that is expected when the system is working properly
  • /var/log/messages does not show anything suspicious: no kernel crash, no OOM kill, but we have some kernel stacktrace like Tasks blocked for more than 120s with a stacktrace related to the storage.
  • hypervisor shows VM almost idle in terms of CPU utilization. Rebooting the system is the only way to make it work again.
  • stack trace on dmesg mentioning kernel tasks hung for more than 120 seconds in io_write / sync syscalls

The hypervisor is Oracle Enterprise Linux 7.2, the VM is CentOS 6.6. It is running a jboss appliance. The block device is of type virtio. The qcow drive is hosted locally on the hypervisor, in a SSD. We suspect something is wrong in the filesystem -> block device -> virtio layer.

nicolasochem
  • 111
  • 4

0 Answers0