1

Last year I set up a pair of servers for my employer, running FreeBSD 10.1 with a large pool of storage in each server. 12 x 2TB disks, in a zpool configured as two raidz2 vdevs of six disks each. One of these servers is a standby and is a replica of the active one.

We would like to create a backup on some type of separate storage to guard against the non-failure kinds of failures, such as administrator error.

Preliminary testing suggests that simply using the standby replica server to 'zfs send' a deduplicated stream (-D) to some external storage would be adequate, but I can't find any information on whether the memory requirements of sending a deduplicated stream are the same as the requirements for using dedup in the first place.

Does zfs send -D have the same memory requirements for the DDT table as normal dedup with ZFS?

William S.
  • 328
  • 2
  • 12

1 Answers1

0

The question has been asked here.

And the answer is yes it needs more memory, because it has to keep track of which block have been sent and which havent. The amount of memory should be proportional to amount of data in the snapshot transferred.

Enabling dedup on the filesystem won't help performance/memory requirements. Enabling SHA256 cheksumms will help performance a little.

btw. the question is, do you make so many changes to the filesystem and have so slow link and have nicely deduplicable data, so that this will make a big difference for you, to make it even worth considering?

Fox
  • 3,977
  • 18
  • 23
  • We want to use a smaller backup target than what the total pool size is. If we can deduplicate and compress the stream, it will take up much less space on the backup. Can you elaborate on how sha256 checksums will help? – William S. May 07 '15 at 22:51
  • From the discussion that was linked in the answer: "It does not use the pool's DDT, but it does use the SHA-256 checksums that have already been calculated for on-disk dedup, thus speeding the generation of the send stream." – Slizzered May 07 '15 at 23:36
  • @cathode If you just want to save space on the receiving end, just enable dedup there. You will have to have it enabled anyway and i am afraid that receiving deduplicated stream won't help that much anyway. – Fox May 08 '15 at 08:47
  • What @Fox said. Deduplicating the send stream only reduces network bandwidth. It does *not* result in a deduplicated filesystem on disk at the other end. You'd need to have the dedup property on the dataset itself you're receiving into. The same goes for compression incidentally, sending a compressed stream will get uncompressed when writing to disk and vice versa. – noitsbecky Jun 12 '15 at 21:38