How to make rdiff-backup bandwidth efficient?

Question

I am using rdiff-backup to backup files from my server to a backup server. I run the backup using a command similar to:

rdiff-backup user@example.com::/home/user/data/complete complete

This backup is working well. However, from the features page of Rdiff-backup it says:

Bandwidth efficient: rdiff-backup depends on librsync, and thus uses the same diffing algorithm as rsync (rsync and rdiff-backup strictly speaking do not share any code however). [...] For instance, suppose you slightly alter large file A to make large file A', and A is still on the remote system. When rdiff-backup is run, it will only send over the diff A->A' [...]

The files on the remote are database dumps, generated using mysqldump, that are created hourly. The data does not change much from hour to hour. Each filename has the format YYYYMMDDHHMM.sql.

Based on my interpretation of the above 'feature', rdiff-backup should be sending a small diff to create the file based on the other files in the directory -- in other words, if A prime is the latest backup and A is the T-1 backup, it should send a small diff to get from A to A prime.

However, it quite clearly is not working in this way. It is sending the entire new file, even though the new file is only slightly different. I would expect data transfer to be a few megabytes, but it is transferring hundreds of megabytes.

Also from the man page:

rdiff-backup can also operate in a bandwidth efficient manner over a pipe, like rsync(1). Thus you can use ssh and rdiff-backup to securely back a hard drive up to a remote location, and only the differences will be transmitted. Using the default settings, rdiff-backup requires that the remote system accept ssh connections, and that rdiff-backup is installed in the user's PATH on the remote system. For information on other options, see the section on REMOTE OPERATION.

So my question is:

Am I interpreting this feature correctly?
If I am, how do I use rdiff-backup to function in this way?

score 1 · Answer 1 · answered Oct 29 '16 at 18:44

Per your post you are generating every hour unique sql dump YYYYMMDDHHMM.sql
This is every time new file with unique file name.
If you was changing files on the source (instead of generating new files) - then this feature would apply.
Otherwise it looks at the source, discovers entirely new file YYYYMMDDHH+1MM.sql, it has no idea that on the destination file named YYYYMMDDHHMM.sql is a very close thing and it will start syncing file YYYYMMDDHH+1MM.sql to the destination.
If you want to use this feature - then when new file YYYYMMDDHH+1MM.sql is generated on source - you would need to fire up some script which will connect to destination and make a copy of file YYYYMMDDHHMM.sql

cp YYYYMMDDHHMM.sql YYYYMMDDHH+1MM.sql<br>

After that fire up your sync.
In this way it will discover that destination has file with the same name and hopefully will attempt to use this partial sync algorithm.

How to make rdiff-backup bandwidth efficient?

1 Answers1