3

I backup my laptop to a Fedora desktop daily using rsync with hard links. This has worked great for almost a year.

I recently purchased a new computer, transferred over my data, and would like to continue backing up this computer daily.

However, due to the data transfer from the old laptop to the new laptop, the timestamps have obviously changed, and will thus cause my daily rsync backup to re-transfer all of the data.

I thought that by adding the -c (checksum) switch to my rsync backup it would match files based on checksum, instead of timestamp and size, and only transfer those files that are different or not present. This appeared to work, but upon examining the new backup, hard links are not being created, and it appears the files that should be hard linked are simply being copied to the new backup directory from the previous backup directory on the backup server. This is very peculiar behavior to me, and I am having trouble figuring out why this is occurring. Checksums match for files that I think should be hard linked.

I have looked through the rsync man page and Google'd around a bit and have been unable to find anything for me to better understand this behavior.

user75058
  • 43
  • 4

2 Answers2

2

I think you are misunderstanding both the checksum and hard link options.

The --checksum option is described in the man page as "skip based on checksum, not mod-time & size". It means that mod time and size are basically ignored, but it does mean that all files are read on both sides (because it has to read the file to compute the checksums.

It's important to realize that rsync does this anyway if the time and size are different. So --checksum causes much more work (reading every file), than without it. Without it, the checksums are done only if the mod time or size are different. As said above, this only influences what files to skip.

--checksum is typically used in backup scripts for the equivalent of a "full backup", say once a month. This ensures that any file which may have changed, but in such a way that the mod time and size remain the same, get correctly backed up.

The --hard-links option (from the man page): "This tells rsync to look for hard-linked files in the transfer". Note that it is only in the transfer, so it won't detect that you have an existing copy of the data on the rsync server, in another location, and hard link it. It only links files that are being transferred with other files that have previously been transferred.

So, if you want the new laptop's backup directory to be hard-linked to the old laptop's backup directory, you will need to remove the new laptop's backup directory, and re-create it using hard links (say, via cp -al). However, if all of your file dates have changed, you're likely going to run into issues with rsync re-transferring these files and breaking those hard links. You'd first probably need to rsync the one laptop to the other, being careful not to rsync over data that truly needs to be different between them. That way the files should have the same dates, and that will make your rsync backups happier.

I know you've said you read the man page, but I'd encourage you to look at it again, specifically the detailed descriptions of the --checksum and --hard-links options. You probably should also read about the --in-place option as well, as it may interact badly if you are trying to preserve hard links.

Sean Reifschneider
  • 10,720
  • 3
  • 25
  • 28
  • Thank you for the thorough response; it helped me better understand a few things. To keep things simple, I think I am going to create a separate backup directory for this laptop and perform differential backups daily using hard links. If anyone is interested, this is the backup script I use: http://pastebin.com/BTPwcPqf A large part of that script was derived from here: http://blog.interlinked.org/tutorials/rsync_time_machine.html – user75058 Mar 19 '11 at 16:38
1

--checksum will as you expected avoid to transfer the files, and use the existing backup as reference

--link-dest will hard link your new backup with the older one, hence reducing disk usage.

But... but if the timestamps of original files are different from the older backup, the hard links will be impossible (two hard link cannot have different timestamps, ownerships or permissions. so you end with no hard links..

the work around is to use fdupes -r1L /folder1 /folder2 it will replace folder2-files with hard links whith the only condition that the content is identical.

BTW, see also my shell script to do snapshot backups of your full filesystem using rsync with hard-links between the backups (deduplication) to have a full backup taking as few disk space as if it would be an incremental one. It comes with tuning settings like MD5 integrity signature, 'chattr' protection, filter rules, disk quota, retention policy with exponential distribution (backups rotation while saving more recent backups than older). it is freely available: http://blog.pointsoftware.ch/index.php/howto-local-and-remote-snapshot-backup-using-rsync-with-hard-links/

cheers Francois Scheurer