5

I am looking for ways to create some kind of file replication without losing too much performance on file operations. A real RAID is not an option due to non-technical constraints.

As far as I understand, DRBD tries to act like a real RAID1, distributing the changes immediately. From what I read, this only makes sense, if the servers are really close to eachother (same rack). However, I can live with a certain lag (eg. 15-30 minutes) between the replication of the filesystem and a partial loss of data in case of HD failure.

Can you point me into some direction? Is there a non-realtime file replication? Or would I be better of simply calling rsync over and over again? Are there any benchmarks on comparing DRBD with different latencies with Software-RAID systems.

Martin
  • 326
  • 2
  • 14
  • How slow or limited is the network? If there aren't many writes and the network latency isn't a problem when writing, DRBD might still works for you. You don't know until you've benchmarked. And if you decide to go the rsync-route, you probably don't have to figure out the script yourself: http://code.google.com/p/lsyncd/ – ptman Jan 26 '12 at 06:03
  • @ptman Currently the relevant servers are about 10ms and 7 hops away, in the near future it will be 0.5ms and 4 hops. But the bandwidth during peak is probably max. 10 Mbit/s or even less. (and thank you very much for the lsyncd link) – Martin Jan 26 '12 at 13:45

4 Answers4

3

Two more ideas for you:

  • Use DRBD in mode "A" (=async mode) and turn up the buffers (max should be about 8 MB). This will allow drbd to lag behind a little bit.
  • Use rsync, but also use the rsync-server-mode on your targets. That way the checksumming process will speed up.
Nils
  • 7,695
  • 3
  • 34
  • 73
2

You could also use NBD with MDADM, i am just evaluating a similar scenario for a client, but i did not come round to doing benchmarks yet.

Niko S P
  • 1,182
  • 8
  • 16
  • Thank you Niko. I did not know about this. I found the following link http://files.calum.org/network-raid.html that gave me a quick understanding of your idea. However I am sceptical about the benefit compared to a periodic rsync execution. It permanently writes over the net, but does not give the advantages like concurrent access http://www.drbd.org/users-guide/ch-gfs.html (have not tested it yet) – Martin Jan 26 '12 at 14:02
2

Possibly GlusterFS will be a solution. http://www.gluster.org/

In my experience it's capable of coping with slow networks well enough.

favoretti
  • 263
  • 2
  • 7
  • It says in the minimum requirements that 1Gb network or more should be used, but I will give it a try. Concurrent access would make things much easier. – Martin Jan 27 '12 at 13:06
  • Sure, but I don't think it's the case, we did use it on slow networks and it worked allright-ish. – favoretti Jan 29 '12 at 12:44
  • Some time has passed, and now I was finally able evaluate it properly. I decided not to use GlusterFS (3.2). On the plus side, super easy to set up. The write performance was consistent with the 100 Mbit network being the bottleneck. But the read performance was bad, given a replication setup. – Martin May 16 '12 at 19:28
1

If lag and some loss of data is not a concern you could write your own small rsync script, something like:

rsync -av --delete /etc /root /home /usr /etc /var /opt user@nfs.example.org:/

And run it every 15 minutes. However it may be too slow in gathering and transmitting all data and not yet be done before the next one runs. Although when rsync has run at least once it's quite fast in subsequent runs.

You can also try rsnapshot: http://www.debian-administration.org/articles/217

"Like many backup solutions rsnapshot is a script which is built upon a foundation of OpenSSH and Rsync - the latter being used to synchronise directory contents without using excessive bandwidth, and the former to ensure the communication is encrypted and secure."

aseq
  • 4,610
  • 1
  • 24
  • 48