In order to save space with backups, I wanted to know if I can use any deduplication or if I have use a special implementation. In other words, say I use backup software x. If I ship those backups off to a server that has deduplication, like FreeNAS, is that something that can later be restored, or do I have to have special software in my backup product to get and restore the data? Some vendors like Dell have special backup appliances that have deduplication, but if I can build my own, I'd save a bunch of money.
2 Answers
If you used the deduplication built in to a filesystem like ZFS (which is what FreeNAS can use), then any file-level backups (e.g. rsync) will not see the deduplication and copy the files as if they were stored normally.

- 1,371
- 1
- 17
- 30
-
What does use deduplication? – johnny Mar 16 '16 at 18:53
-
@johnny It's built into a variety of products, it can exist at the filesystem level, or higher up the stack in some application. Do you mean something more specific? – zymhan Mar 16 '16 at 19:34
-
No. I don't think so. I just don't want to pay for an expensive Dell box, if I can build my own. – johnny Mar 18 '16 at 18:15
-
In that case, you can take advantage of dedupe using something like FreeNAS, and you don't need to worry about how the dedupe affects your files. As long as you copy them to and from the server with normal utilities (rsync, scp, etc.) then it'll just act like a normal file server. Deduplication is hidden well beneath the covers, and you'll likely only notice it in the space saved and extra CPU cycles required. – zymhan Mar 18 '16 at 18:38
If you're running OS X or Linux, you could use a de-duplicating backup program like borg backup. It does compression with a very good de-duplication strategy, plus offers client-side encryption. (I prefer to use LUKS encryption on the USB backup drives.)
Works well over SSH or writing to an attached drive. I've also used it under Cygwin64 to backup files on a Windows 7 machine in the past.
The big advantage to doing your de-duplication on the client is that you're only sending what is new/changed over the SSH connection to the remote (assuming that the remote host also has borg installed in the path). That makes it really efficient. Without having borg installed on both the client and the server, it's still possible to do backups via sftp:// or some other supported protocols but nowhere near as efficient use of the bandwidth.

- 609
- 8
- 19