I have a new Debian VPS that appears to fail almost every time I run a heavy disk write test on ext4 filesystem. The filesystem goes to read-only mode and "ata1: lost interrupt (Status 0x50)" is recorded to /var/log/messages. What do you suggest me to do next? Are there some filesystem or server paramaters that I could change? Is there some way to debug this deeper? Should I switch to ext3 for good? Or should I switch from Debian to Ubuntu or CentOS?
Here is what has happened so far. I got a new VQ 12 VPS from Hetzner. I have a standard server test procedure to run before accepting them into production use. I installed 64-bit Debian from Hetzner image, upgraded it to latest patches and started testing. When I ran dd write commands on ext4 partitions, like
dd if=/dev/zero of=/root/test.bin bs=2M count=4k conv=fdatasync
I almost immediately got
dd: writing `/root/test.bin': Read-only file system
and found a line
kernel: [ 457.816093] ata1: lost interrupt (Status 0x50)
on /var/log/messages. The filesystem can be recovered and also recovers on reboot
May 5 19:54:29 ****vq12 kernel: [ 1.772377] EXT4-fs (sda3): INFO: recovery required on readonly filesystem
May 5 19:54:29 ****vq12 kernel: [ 1.773184] EXT4-fs (sda3): write access will be enabled during recovery
May 5 19:54:29 ****vq12 kernel: [ 2.001101] EXT4-fs warning (device sda3): ext4_clear_journal_err: Filesystem error recorded from previous mount: IO failure
May 5 19:54:29 ****vq12 kernel: [ 2.002159] EXT4-fs warning (device sda3): ext4_clear_journal_err: Marking fs in need of filesystem check.
May 5 19:54:29 ****vq12 kernel: [ 2.004316] EXT4-fs (sda3): recovery complete
May 5 19:54:29 ****vq12 kernel: [ 2.005316] EXT4-fs (sda3): mounted filesystem with ordered data mode
but as soon I continue testing the issue occurs again. I contacted the support and they suggested me to increase /sys/block/sda/device/timeout but it appears to be insignificant.
root@****vq12 ~ # echo "600" > /sys/block/sda/device/timeout
root@****vq12 ~ # cat /sys/block/sda/device/timeout
600
root@****vq12 ~ # mount | grep " / "
/dev/sda3 on / type ext4 (rw)
root@****vq12 ~ # dd if=/dev/zero of=/test.bin bs=2M count=4k conv=fdatasync
dd: writing `/test.bin': Read-only file system
3096+0 records in
3095+0 records out
6492217344 bytes (6.5 GB) copied, 116.353 s, 55.8 MB/s
They also migrated the VPS to another node but it didn't help either. I've tested 32-bit and 64-bit Debian both straight from the image and upgraded to latest patches and it occurs constantly on all combinations. But ext3 seems to be unaffected. Ubuntu and CentOS even with ext4 (tested on the same server) seem to be unaffected.
During the recent times I have run the same write test on a number of Debian and other Linux servers (even on some Hetzner VQ, EQ, EX servers) and it has never occurred before.