0

I have a setup with to nodes with drbd to sync kvm VPSes for fallback. So the vps is only active on one node. The active node has 4 kvm vpses on them.

The two nodes have a dedicated 10G interface for drbd sync. So that should not give an io problem.

Sysbench gives a disk io performance of about 400Mb/s.

The problem is that on random intervals, one of the VPSes starts to peak in io at a rate of about 400MB/s (same disk io limit) and becomes unresponsive. The other vpses are still responsive at that time. I'm unable to find what is causing the high I/O at that moment. The server is not responsive so I can't login with ssh at that moment. I do use telegraf->influxdb to monitor the vps. There I can see that the I/O is going high, but I'm not sure how I can use it to find which application/user is causing the high load and/or why only this vps is affected but not the other vpses while they use the same underlaying drbd/disks.

Any suggestion on how to debug this?

Vincent
  • 291
  • 1
  • 4
  • 10
  • 1
    The title of your post makes this seem like DRBD is impacting the performance of your IOs, but the real issue (if I'm understanding correctly) seems to be that you can't pinpoint a process saturating your block device in a KVM, which just so happens to be sitting on top of a DRBD device.... consider changing the title to be more reflective of the issue at hand: "Identifying processes saturating VPS IO", or something. – Matt Kereczman Apr 12 '18 at 16:49
  • Yes, I'm indeed not blaming DRBD for the high I/O. So I changed the title like you suggested. – Vincent Apr 16 '18 at 11:33

1 Answers1

1

You can use iotop to identify what process on your KVM is generating the 400MB/s of IO if you happen to catch it at one of the "random times".

Or you could use something like pidstat -d to write out report files at some interval that you could sift through later to see what process ran away with your disks.

400MB/s is a lot of IO to be unaccounted for, so I hope you track that down!

Matt Kereczman
  • 1,899
  • 9
  • 12