7

I'm researching how can we implement near-realtime replication from primary datacenter to a disaster recovery site. Data that would get replicated would be:

  • Images of KVM VMs
  • MySQL and PostgreSQL databases

For the sake of simplicity let's assume it's less than 10TB of data in total with average write speed of under 100MB/s, peaking at 1500MB/s and link between the primary and backup datacenter would have throughput of 10gbit/s.

Asynchronous replication is acceptable and desired - in case of bursty writes or short outage of the connectivity between both datacenters - we don't want to slow down local write speed and are willing to sacrifice the most recent portion of data that might be lost in case of catastrophic failure affecting the primary datacenter.

My understanding is that we can choose between:

  • proprietary SAN hardware that can come with replication feature and can provide iSCSI LUNs
  • DRBD that will likely need to be combined with DRBD proxy [ to make sure that temporary drop of available bandwidth or latency spike between both datacenters does not affect write performance at the source ]
  • Software-based solutions like http://schoebel.github.io/mars/, which - sadly - will take quite a while to be merged into mainline kernel in the best case scenario
  • For DBs - database-level replication is an option as well but - we'd like to carry occasional DR tests for which we want to switch all workloads between the datacenters. Failing back from the DR site to the main site would be quite cumbersome.

Are there any other solutions worth considering?

Thank you!

pQd
  • 29,981
  • 6
  • 66
  • 109
  • Software recommendations are off topic here - and you have already found the alternative (MARS). I never found a suitable alternative to DRBD+drbd-proxy ($$) but you should look at ZFS streaming. There are also some hacks around LVM for block replication out there as well. – davidgo May 31 '20 at 21:28
  • @davidgo - you linked to *this* question – warren Jun 01 '20 at 19:31

2 Answers2

4

For DBs and DB-aware & dependent applications their proprietary replication is always preferred over the "generic" block-level one due to many reasons, DB consistency is one of those. So use SQL Server Availability Groups (AGs) available with some limitations even with a Standard version of SQL Server since 2014 or 2015, use MS Exchange DAG, SAP HANA, AeroSpike replicas etc. I wouldn't do DRBD in 2020 due to it's rather poor I/O performance especially with an all-flash configurations, DRBD is clearly made for spinning disks and high latency non-RDMA networks back in early 2000s, and extremely poor protection against split brain issues. Virtual SAN technology you can find as a part of the major hypervisors is another good option to DB's built-in replication tech.

BaronSamedi1958
  • 13,676
  • 1
  • 21
  • 53
  • 1
    As a VSAN, OP might consider StarWind VSAN. Their Linux edition, might be helpful. Windows version still works great. https://www.starwindsoftware.com/starwind-virtual-san-goes-linux – Stuka Jun 21 '20 at 12:58
-3

You could consider geo-replicated SDS solutions like Gluster and Ceph, utilize ZFS, or LVM replication.

For KVM, qemu now has a CDC feature, and various solutions are being built towards streaming the changed blocks across a network link without replicating the entire underlying block device.

For any other software (you mentioned databases) it is really a better approach to use the native tools your database might provide. Many modern NoSQL databases are masterless and can simply run in multi-DC mode, with replicas pushed per DC or rack.

dyasny
  • 18,802
  • 6
  • 49
  • 64
  • 2
    Both Ceph and Gluster are an overkill for OP’s basic two-node scenario. ZFS is OK, but doing it inside a VM is kinda challenge... – RiGiD5 Jun 01 '20 at 16:05
  • My answer contains way more than just a brief mention of ceph and gluster. – dyasny Jun 01 '20 at 23:47