2

Every system admin knows that rsync is the way to go if you need to do backups of large files, e. g. a database file: since it uses delta compression, it only overwrites blocks that have been modified from the original file to the backup one, avoiding a lot of overhead.

Yet for certain file formats, such as gzip and bzip2, modifying the file causes the entirety of the blocks to be copied again, since it causes some kind of butterfly effect in the file structure that modifies it, even when changing just a couple bytes.

So, which compressed file formats are the most rsync friendly? Conversely, are there any other formats that must be avoided when time is golden and there is 10K of data that must be backed up?

1 Answers1

6

Did you know that recent versions of gzip have an --rsyncable option? From the manpage:

While compressing, synchronize the output occasionally based on the input. This increases size by less than 1 percent most cases, but means that the rsync(1) program can take advantage of similarities in the uncompressed input when syncronizing two files compressed with this flag. gunzip cannot tell the difference between a compressed file created with this option, and one created without it.

Willem
  • 2,872
  • 4
  • 28
  • 35