How to create a ZFS-On-Linux pool replication (main -> backup)

Question

I am currently running two identical servers with same harddisks.
Both have a zfs pool on sda and sdb (raidz) named /pool
both run ubuntu linux server
Both supposed to operate on vanilla toolset (no fancy 3rd party backup packages).

I now struggle to find a clear step-by-step straight forward how to for creating an automatic backup from main to backup.

Everything I have found was either old (2010-2016) so I am unsure if they are still up to date and/or are just explaining certain details and not the whole process or they discuss additional tools, wrappers, scripts not the process itself using default zfs toolset.

What is the best practice here?
Am I running a cron job for this or can zfs do the backup automatically?
Transfer via ssh or rsync?
zfs send piped into ssh/rsync?
I want to run it on a local network, but just in case I'd think about up-scaling to an off-location, could I do the same best practice over the internet from one dedicated server housing provider to the other or would this need a total different approach?
Say I'd come down and connect both servers locally directly via dedicated interface without going over the LAN, I'd be fine with using unencrypted data transfer without ssh to reduce encryption overhead. What would be the best utility for that?

score 2 · Answer 1 · answered Jun 19 '19 at 14:40

Easy way: simply use syncoid and call the job done

Harder/longer way: you need to tap into incremental zfs send / recv. As it has multiple modes of operation, I do not think it can covered extensively in a concise answer. Let be said you need a first, full zfs send | zfs recv, followed by regular incremental ones. I would point you to Oracle docs for more details.

In both cases, be sure to put the to-be-synched dataset into a proper dataset, rather that using the root dataset (ie: put your data in pool/data rather than directly in pool).

score 1 · Answer 2 · answered Jun 21 '19 at 13:56

Here is an opensvc service running a kvm node named mywin, and it replicate (zfs send|zfs receive) every hour the zfs dataset data/mywin from primary node srv1 to zfs dataset data/mywin on secondary node srv2 :

root@srv1:~# om mywin print config
[DEFAULT]
env = PRD
nodes = srv1.acme.com srv2.acme.com
id = cd6e0bfa-4096-4249-899a-c8cd90a8979b

[sync#1]
src = data/{svcname}
dst = data/{svcname}
type = zfs
target = nodes
recursive = true
schedule = @60

[fs#1]
mnt_opt = rw,xattr,acl
mnt = /srv/{svcname}
dev = data/{svcname}
type = zfs

[container#0]
type = kvm
name = {svcname}
shared = true

adjust to your environment, and remove the specific dataset name, and then the replication should be fine for the whole pool.

you can manually trigger the replication with command om mywin sync nodes

PS : ensure you have a mutual root ssh trust between 2 nodes

How to create a ZFS-On-Linux pool replication (main -> backup)

2 Answers2