I'm running a simple bash script that uses rsync to do an incremental backup of my web server every hour. What I'm looking for is an efficient algorithm to delete the proper backups so that in the end I keep:
- hourly backups for 24 hours
- daily backups for 1 week
- weekly backups for 1 month
- monthly backups from that point on
I'll figure out when to delete the monthly backups based upon when I run out of space. So we're not worried about that. I'm also familiar with the various wrappers for rsync like rdiff-backup and rsnapshot but those aren't necessary. I prefer to write code myself whenever possible even if it means reinventing the wheel sometimes. At least that way if I get a flat tire I know how to fix it :)
Here's the actual commented code that runs every hour:
#if it's not Sunday and it's not midnight
if [ $(date +%w) -ne 0 ] && [ $(date +%H) -ne 0 ]; then
#remove the backup from one day ago and one week ago
rm -rf $TRG1DAYAGO
rm -rf $TRG1WEEKAGO
fi
#if it's Sunday
if [ $(date +%w) -eq 0 ]; then
#if it's midnight
if [ $(date +%H) -eq 0 ]; then
#if the day of the month is greater than 7
# we know it's not the first Sunday of the month
if [ $(date +%d) -gt 7 ]; then
#delete the previous week's files
rm -rf $TRG1WEEKAGO
fi
#if it's not midnight
else
#delete the previous day and week
rm -rf $TRG1DAYAGO
rm -rf $TRG1WEEKAGO
fi
fi
So basically:
If it's not Sunday and it's not 3am:
- delete the backup from one day ago
- delete the backup from one week ago
If it is Sunday:
If it is Midnight:
If the day of the month is greater than 7:
-delete the backup from one week ago
Else (if it's not Midnight)
- delete the backup from one day ago
- delete the backup from one week ago
That seems to work but I'm wondering if anyone can come up with a simpler, more efficient algorithm or add any ideas for a better way of accomplishing this. Thanks.