Why do I see a large performance hit with DRBD?

Question

I see a much larger performance hit with DRBD than their user manual says I should get. I'm using DRBD 8.3.7 (Fedora 13 RPMs).

I've setup a DRBD test and measured throughput of disk and network without DRBD:

dd if=/dev/zero of=/data.tmp bs=512M count=1 oflag=direct
536870912 bytes (537 MB) copied, 4.62985 s, 116 MB/s

/ is a logical volume on the disk I'm testing with, mounted without DRBD

iperf:

[  4]  0.0-10.0 sec  1.10 GBytes   941 Mbits/sec

According to Throughput overhead expectations, the bottleneck would be whichever is slower, the network or the disk and DRBD should have an overhead of 3%. In my case network and I/O seem to be pretty evenly matched. It sounds like I should be able to get around 100 MB/s.

So, with the raw drbd device, I get

dd if=/dev/zero of=/dev/drbd2 bs=512M count=1 oflag=direct
536870912 bytes (537 MB) copied, 6.61362 s, 81.2 MB/s

which is slower than I would expect. Then, once I format the device with ext4, I get

dd if=/dev/zero of=/mnt/data.tmp bs=512M count=1 oflag=direct
536870912 bytes (537 MB) copied, 9.60918 s, 55.9 MB/s

This doesn't seem right. There must be some other factor playing into this that I'm not aware of.

global_common.conf

global {
usage-count yes;
}

common {
protocol C;
}

syncer {
al-extents 1801;
rate 33M;
}

data_mirror.res

resource data_mirror {
    device /dev/drbd1;
    disk   /dev/sdb1;

    meta-disk internal;

    on cluster1 {
       address 192.168.33.10:7789;
    }

    on cluster2 {
       address 192.168.33.12:7789;
    }
}

For the hardware I have two identical machines:

6 GB RAM
Quad core AMD Phenom 3.2Ghz
Motherboard SATA controller
7200 RPM 64MB cache 1TB WD drive

The network is 1Gb connected via a switch. I know that a direct connection is recommended, but could it make this much of a difference?

Edited

I just tried monitoring the bandwidth used to try to see what's happening. I used ibmonitor and measured average bandwidth while I ran the dd test 10 times. I got:

avg ~450Mbits writing to ext4
avg ~800Mbits writing to raw device

It looks like with ext4, drbd is using about half the bandwidth it uses with the raw device so there's a bottleneck that is not the network.

Can you attempt this with a large file with actual data? Writing all zeros may be a special case handled by the disk for benchmarks. — Jeff Strunk, Nov 04 '11 at 17:11
The DRBD user manual recommends measuring throughput the way I did. http://www.drbd.org/users-guide-legacy/ch-benchmark.html That's a good idea, the only problem is that if I use a file then the read speed of the disk I read the file from comes into play. I'll see if I can figure out a way to do it that isn't too dependent on the disk I'm reading from. — BHS, Nov 04 '11 at 17:27
I tried the same dd commands except getting input from a file on another disk. I got similar results. — BHS, Nov 04 '11 at 20:01
Can you please give more information about your hardware, too? — Jeff Strunk, Nov 07 '11 at 14:26
Have you measured the actual network throughput with the switch and without? 1Gbps doesn't really mean you will ever actually get 1Gbps. It depends on the network chipset, the driver, the cables, the switch, etc. — Jeff Strunk, Nov 09 '11 at 14:52
When I measure it with the switch I get ~ 940 MBs. I would expect that if the switch were the cause I would see a consistent performance hit regardless of whether I access the drbd device as a raw filesystem or formatted. — BHS, Nov 09 '11 at 15:29
Have you tried the TCP tuning suggestions in 15.3.3 "Tuning TCP Send Buffer Size"? 15.3.1 "Setting max-buffers and max-epoch-size" might also be of value. Also, what is your network MTU for the DRBD link ? — Kendall, Jan 01 '12 at 21:50
Also, writing a single 512MB block is not particularly effective; you should try using benchmarking applications like bonnie++ or iozone. At the least, with dd, write >=2x RAM and repeat it a few times. — Kendall, Jan 02 '12 at 23:04
I'm traveling so I can't try anything out until I get back. But, I seem to get very different performance depending on whether it's a raw partition or formatted. The reason I used 512MB was that I copied a dd line from the manual section on performance tuning. Once I get back and setup to test again, I'll try with larger sizes and those benchmarking tools. Thanks for the comments. — BHS, Jan 04 '12 at 02:23
My wager is ext4's journal fsyncs are killing the performance. Can you try with an external journal or no journal at all (ext2)? — R. S., Jun 01 '13 at 21:01

score 2 · Answer 1 · answered Jan 01 '12 at 20:38

2

You are limiting the bandwith with "rate=33M" why? You are using the synchronous protocol "C" why?

I usally use protocol "A" and 8 MB buffer. For a Gigabit line and heavy traffic I limit to "rate=90M".

answered Jan 01 '12 at 20:38

Nils

7,695
3
34
73

3

Heavy traffic on a Gigabit line and you set rate to 90M?! You might want to double check that. From the DRBD User Manual: "A good rule of thumb for this value is to use about 30% of the available replication bandwidth." The reason for this is that the "rate" is sync rate --ie, how much B/W to use when sync'ing an Inconsistent device. Setting the sync rate too high can starve the target application of bandwidth. Additionally, DRBD User Manual recommends 33M for a Gig line, which is likely why he chose that value. – Kendall Jan 01 '12 at 21:39
2

@Kendall - I don`t replicate via an application interface. I use a dedicated interface/network for this kind of traffic. – Nils Jan 01 '12 at 21:52
1

As to the "why" on Protocol C, Protocol A leaves open the possibility of data loss, since the write is considered completed when the local disk flush has occurred and the replication packet hits the local send buffer. – Kendall Jan 01 '12 at 21:53
So you sync an Inconsistent disk over a _different_ network interface than the primary DRBD interface ? – Kendall Jan 01 '12 at 21:57
1

The disk can be inconsistent on the target side of drbd. This does not matter since I do not run databases on the primary side. So worst case is the loss of some logs. The servers have many interfaces used for different purposes. One interface exposes the service to the outside world. I use a DIFFERENT interface for drbd traffic. – Nils Jan 02 '12 at 22:04
I don't think we're quite seeing eye to eye here. So lets say DRBD uses eth0 on your system, and that eth0 is a back-to-back GigE connection that does 110MB/s, and your disk subsystem does 120MB/s. Thus, your total available bandwidth for DRBD is 110MB/s: that is split between whatever application is using DRBD, and whatever re-sync'ing DRBD has to do. So with a sync rate of 90M, your DRBD application has available only 20MB/s. If in normal operation you exceed this, then your sync rate is too high. – Kendall Jan 02 '12 at 23:02
Ok - here is the setup: eth1 is the Interface where the application (webserver or whatever) is being contacted. It is also the interface where the default-gateway is located. The corresponding database for the application is located at a different server, traffic to/from database goes through eth1. eth2 is the interface dedicated to heartbeat and drbd traffic. The disk-subsystem is doing 300 MB/s. eth1/2 are Gigabit. In this setup there is no problem using almost the full bandwidth for drbd. – Nils Jan 03 '12 at 20:30

Why do I see a large performance hit with DRBD?

1 Answers1