1

I have a new Debian VPS that appears to fail almost every time I run a heavy disk write test on ext4 filesystem. The filesystem goes to read-only mode and "ata1: lost interrupt (Status 0x50)" is recorded to /var/log/messages. What do you suggest me to do next? Are there some filesystem or server paramaters that I could change? Is there some way to debug this deeper? Should I switch to ext3 for good? Or should I switch from Debian to Ubuntu or CentOS?

Here is what has happened so far. I got a new VQ 12 VPS from Hetzner. I have a standard server test procedure to run before accepting them into production use. I installed 64-bit Debian from Hetzner image, upgraded it to latest patches and started testing. When I ran dd write commands on ext4 partitions, like

dd if=/dev/zero of=/root/test.bin bs=2M count=4k conv=fdatasync

I almost immediately got

dd: writing `/root/test.bin': Read-only file system

and found a line

kernel: [  457.816093] ata1: lost interrupt (Status 0x50)

on /var/log/messages. The filesystem can be recovered and also recovers on reboot

May  5 19:54:29 ****vq12 kernel: [    1.772377] EXT4-fs (sda3): INFO: recovery required on readonly filesystem
May  5 19:54:29 ****vq12 kernel: [    1.773184] EXT4-fs (sda3): write access will be enabled during recovery
May  5 19:54:29 ****vq12 kernel: [    2.001101] EXT4-fs warning (device sda3): ext4_clear_journal_err: Filesystem error recorded from previous mount: IO failure
May  5 19:54:29 ****vq12 kernel: [    2.002159] EXT4-fs warning (device sda3): ext4_clear_journal_err: Marking fs in need of filesystem check.
May  5 19:54:29 ****vq12 kernel: [    2.004316] EXT4-fs (sda3): recovery complete
May  5 19:54:29 ****vq12 kernel: [    2.005316] EXT4-fs (sda3): mounted filesystem with ordered data mode

but as soon I continue testing the issue occurs again. I contacted the support and they suggested me to increase /sys/block/sda/device/timeout but it appears to be insignificant.

root@****vq12 ~ # echo "600" > /sys/block/sda/device/timeout
root@****vq12 ~ # cat /sys/block/sda/device/timeout
600
root@****vq12 ~ # mount | grep " / "
/dev/sda3 on / type ext4 (rw)
root@****vq12 ~ # dd if=/dev/zero of=/test.bin bs=2M count=4k conv=fdatasync
dd: writing `/test.bin': Read-only file system
3096+0 records in
3095+0 records out
6492217344 bytes (6.5 GB) copied, 116.353 s, 55.8 MB/s

They also migrated the VPS to another node but it didn't help either. I've tested 32-bit and 64-bit Debian both straight from the image and upgraded to latest patches and it occurs constantly on all combinations. But ext3 seems to be unaffected. Ubuntu and CentOS even with ext4 (tested on the same server) seem to be unaffected.

During the recent times I have run the same write test on a number of Debian and other Linux servers (even on some Hetzner VQ, EQ, EX servers) and it has never occurred before.

XDF
  • 86
  • 4
  • I would say with a VPS and a hoster supplied OS image (as long as you did not mess with the kernel/driver stack and settings), this is decidedly still a problem for the hoster to solve. This is my opinion and not legal advice. The fact ext3 seems to run fine might jusk mask an underlying problem that might just as well be at the hypervisor/hardware level, even if it happens on both (probably hardware identical) nodes. – rackandboneman May 09 '12 at 23:36
  • I just had a problem which seems similar to yours, and also on Hetzner's VQ12, except that I run a `hdparm -t` test (which completed successfully) and the problem showed up few hours later. It indeed looks strange, and I plan to contact their support. May I ask whether you resolved the problem? – liori Dec 26 '12 at 17:52
  • The problem was not resolved. I ended up in a situation where I had used too much time to investigate the issue and had no clue what to try next. I decided not to use that combination (Hetzner VQ + Debian + ext4). I went back to ext3. – XDF Dec 27 '12 at 14:43

0 Answers0