We've been running some production services on Amazon EC2 for a while, using mainly t1.large and t1.xlarge instances (non-EBS). Every so often one of the attached (ephemeral disks) will get into a state of 100% util (as reported by iostat -xtc).
When a disk gets in this state, it is essentially completely unusable. A reboot fixes the issue, seemingly without any corruption. Occurrences are apparently random and happen every few weeks.
I'm not sure if any software is related, but we're running up-to-date Ubuntu 10.04 (Lucid). These ephemeral disks currently operate under lvm (RAID0). Previouslly we were using mdadm in conjunction with lvm.
Has anyone else seen this behavior before (not sure it is specific to EC2) and any ideas how to avoid it or correct for it without rebooting?