-2

I'm running a simple bash script that uses rsync to do an incremental backup of my web server every hour. What I'm looking for is an efficient algorithm to delete the proper backups so that in the end I keep:

  • hourly backups for 24 hours
  • daily backups for 1 week
  • weekly backups for 1 month
  • monthly backups from that point on

I'll figure out when to delete the monthly backups based upon when I run out of space. So we're not worried about that. I'm also familiar with the various wrappers for rsync like rdiff-backup and rsnapshot but those aren't necessary. I prefer to write code myself whenever possible even if it means reinventing the wheel sometimes. At least that way if I get a flat tire I know how to fix it :)

Here's the actual commented code that runs every hour:

#if it's not Sunday and it's not midnight
if [ $(date +%w) -ne 0 ] && [ $(date +%H) -ne 0 ]; then 
    #remove the backup from one day ago and one week ago
    rm -rf $TRG1DAYAGO
    rm -rf $TRG1WEEKAGO
fi

#if it's Sunday
if [ $(date +%w) -eq 0 ]; then
    #if it's midnight
    if [ $(date +%H) -eq 0 ]; then
        #if the day of the month is greater than 7 
        # we know it's not the first Sunday of the month
        if [ $(date +%d) -gt 7 ]; then
            #delete the previous week's files
            rm -rf $TRG1WEEKAGO
        fi
    #if it's not midnight
    else
        #delete the previous day and week
        rm -rf $TRG1DAYAGO
        rm -rf $TRG1WEEKAGO
    fi
fi

So basically:

If it's not Sunday and it's not 3am:

  - delete the backup from one day ago

  - delete the backup from one week ago

If it is Sunday:

    If it is Midnight:

        If the day of the month is greater than 7:

            -delete the backup from one week ago

    Else (if it's not Midnight)

      - delete the backup from one day ago

      - delete the backup from one week ago

That seems to work but I'm wondering if anyone can come up with a simpler, more efficient algorithm or add any ideas for a better way of accomplishing this. Thanks.

baquila
  • 11
  • 1
  • 2
    "I prefer to write code myself whenever possible even if it means reinventing the wheel sometimes." That's actually not a very realistic way to do things. It may seam appealing at first, but eventually you'll realize that you're completely buried in un-maintainable, insecure, poorly-performing self-made solutions, and will have a lovely time trying to dig yourself out. There are very mature tools around that do *exactly* this. You mentioned rdiff-backup. Rsnapshot is another. Do yourself a favor and just use one of those. – EEAA Jan 08 '15 at 20:53
  • 1
    On top of what @EEAA mentioned, asking other people to help you reinvent the wheel isn't a very appealing offer. – fukawi2 Jan 08 '15 at 22:53
  • Also include [dirvish](http://www.dirvish.org/) in your list of similar packages. If you must build your own, you could minimally look at the the source of these existing tools. – Zoredache Jan 09 '15 at 01:07
  • 1
    The problem here is that if you want to write code, it's not really a Serverfault question; if you have working code and are just looking for advice it's not really a stackoverflow problem either. You might want to try [codereview.stackexchange.com](http://codereview.stackexchange.com/) – Digital Chris Jan 09 '15 at 13:24
  • @DigitalChris Thanks! I think you're right....that's the better place for it. I wasn't aware of that site. Thanks. – baquila Jan 09 '15 at 16:19

1 Answers1

4

I'm wondering if anyone can come up with a simpler, more efficient algorithm or add any ideas for a better way of accomplishing this.

Yep - use rdiff-backup, rsnapshot, or some other option that you don't need to maintain.

I'm not being obtuse here. I realize you said you want to write your own code. Sometimes the correct answer here is not what you want to hear. I wrote this answer for you, but also for the thousands of people that will read your question in the future and see this answer.

If mature, well-vetted, freely-available solutions exist that do what you need, it is nearly always the best decision to go with one of those solutions instead of writing your own.

EEAA
  • 109,363
  • 18
  • 175
  • 245
  • [duplicity](http://duplicity.nongnu.org/) is another nice option. – xofer Jan 08 '15 at 21:09
  • @xofer Yep, agreed. – EEAA Jan 08 '15 at 21:12
  • @EEAA I get what you're saying. But if it's written properly, which is what I'm after, it doesn't need to be maintained. And besides, I'm not looking for the easy way out. I'm looking to build a solution I can count on to do exactly what I want done the way I want it done. And if it needs modifications in the future I need to be able to do that. Rdiff-backup and rsnapshot were written by people who take the same approach as I do and I was hoping to find some of those types here. So the correct answer will be the answer I want to hear. It will be an algorithm. – baquila Jan 09 '15 at 09:47
  • 1
    @baquila Fair enough. If that's what you're looking for, you'll need to go elsewhere. In professional sysadmin, homebrew solutions are to be avoided at all costs, especially for something as critical as backup software. – EEAA Jan 09 '15 at 12:40