1

I want to make incremental backups on a Linux machine, on the next way:

BACKUP1                BACKUP2
|                      |
|                      |
|--- file1             |--- file1 (symlink to file1 in backup1 because it hasn't changed)
|                      |
|                      |
|--- file2             |--- file2 (is copied again because it has changed)
|                      |
|                      |
|--- file3             |--- file 3 (same as file1, a symlink)

Is there any simple way of doing this? I was using this script:

#!/bin/sh

date=`date "+%Y-%m-%dT%H:%M:%S"`
rsync -aP --link-dest=~/Backups/current ~/Documents ~/Backups/back-$date
rm -f ~/Backups/current
ln -s back-$date ~/Backups/current

But that just copies everything again. Thanks :D

pmerino
  • 461
  • 2
  • 5
  • 11
  • 1
    FYI: --link-dest produces hard links, not symlinks. – andol Aug 22 '11 at 17:37
  • I prefer using something like time capsule or windows 2008 backup system – pmerino Aug 22 '11 at 17:39
  • @andol: how would I porduce symlinks instead? – pmerino Aug 22 '11 at 17:41
  • So, do you want to use hard links or symlinks? You example mentions symlink, but then you mention Time capsule, which assuming OSX and Time Machine, implies hard links. – andol Aug 22 '11 at 17:43
  • Any particular reason you want to go with symlinks instead of hard links? – andol Aug 22 '11 at 17:45
  • Well, that script I'm using makes multiple copies (I think), but everything weights the same (or more depending on the files on the backup). I guess I still don't understand hard links, a bit noob on this – pmerino Aug 22 '11 at 17:47
  • I think [dirvish](http://www.dirvish.org/) is the tool you want. It is basically a front-end for rsync. – Zoredache Aug 22 '11 at 18:16
  • I guess i dont understand... Your script IS doing almost exactly what time machine IS doing... You're creating hard links to files that are not different and only adding new ones... This creates a structure of directories that mimic the area you're trying to backup with an additional directory for every time you do the backup. It doesn't copy all of the files everytime... just the new ones... You want to be doing EXACTLY what you are doing... – g19fanatic Sep 07 '11 at 17:43

3 Answers3

3

I use a self written bash script for it using rsync and cpio: http://pastebin.com/uRdH2uQf

So, the first thing I have done is create a directory structure. I work like this: create a backup every day, the 7th day (sunday) I take the last backup (previous week sunday) and put it in weekly. Every 4 weeks I keep a monthly backup.

All these backups are incremental and based on 1 full backup.

my directory structure is based in /mnt/backups and looks like this:

--- SERVER1
   |--- daily
        |  --- 0
        |  --- 1
        |  --- 2
        |  --- 3
        |  --- 4
        |  --- 5
        |  --- 6
    |--- weekly
        |--- 0
        |--- 1
        |--- 2
        |--- 3
    |--- monthly
        |  --- 0
        |  --- 1
        |  --- 10
        |  --- 2
        |  --- 3
        |  --- 4
        |  --- 5
        |  --- 6
        |  --- 7
        |  --- 8
        |  --- 9

I also use a script to quickly create this structure : http://pastebin.com/LyFLBZGx

So, all my scripts are located in /root/backup_tools. The backup.sh script is placed into crontab to run every day. I have key-exchange from my backup server to all of my servers I need to backup. In my tools dir I place my exclude files (folders / files I do not want to backup) in this format:

rsync.exclude.server1

These files contain the not to backup dirs :

/proc
/sys
/tmp

I also use my /.ssh/config file to add the hosts (f.e.: server1.example.com is defined as server1 with ssh port xxxx and username foo). This makes it a lot easier to add the servers to backup in the first line of the script.

Host server1
        User root
        Port 31337
        Hostname server1.example.com

The script will check the rule SERVERS="" and for every server defined there (space seperated) it will start an incremental backup (and excluding all the dirs in the exclude files).

It will use cpio for the rotating of the dirs (cpio allows a copy with link to the actual block on the disk, so the file will show up twice on your hard drive, and only use space once. Its not a symlink either, because when you delete the original file, the duplicate will still be readable)

I hope this was somewhat clear. The bash script is not perfect, but it does its job. I use it to backup 4 servers every night. I have backups of a couple of months now and they are not big. It is really space saving.

Goez
  • 1,838
  • 1
  • 11
  • 15
2

In your example, you mention symlinks, however rsync deals in hard links. You mention in the comments that you're unsure what links are, so the Reader's Digest version is:

  • A symlink is like a "shortcut" in Windows -- it quite simply tells you "the file you want is over there"
  • A hard link has no direct correlation to Windows -- at least not in common usage. A hard link is quite literally another "entry point" to the same file; it appears to the file system to be an exact duplicate of the linked file, however on the physical disk there is only one copy of the file, no matter how many hard links there are.

rsync's '--link-dest' option creates hard links for files that don't change. This makes it somewhat confusing when trying to determine if your script is working as intended, because if you were to check the size of all the files in your backup directory (e.g. using du -sh [directory] or by checking the properties in the GUI), it would look to be the same size as your original directory, regardless of how many of those files are actually hard links and thus not taking up any additional space.

Check the space on disk, using either df or via GUI tools that look at actual disk space. Then, run your backup script, and check again -- if no files changed, disk usage should not change at all (well, okay, a tiny bit -- it takes a small amount of space for the hard link itself); if files did change, disk usage will increase by only what files changed.

In either case, rsync's output will list the files it is checking, whether it's actually copying them or not. Look at the end, at the "speed-up" value -- if that's a number less than 1, that indicates you got at least some hard links, as that represents the percentage difference from what rsync estimates it would have taken to copy all the files.

Kromey
  • 3,641
  • 4
  • 25
  • 30
2

You actually want to do this with hardlinks. The best tool to do these sorts of backups on linux machines is rsnapshot. It does exactly what you describe and is quite simple to set up.

Phil Hollenback
  • 14,947
  • 4
  • 35
  • 52