0

We run on Azure services and I'm backing up their blob storage. There doesn't seem to be anything offered directly on Azure for accomplishing this. Storage accounts offer redundancy, but not backup.

I'm looking to ways to break up a workload in a sensible fashion and didn't know if there were best practices or ideas around this. I have several hundred numbered containers, some with thousands of files in them, so the work at the rate it seems to be copying will take something like 15 days. My script is limited to running for 3 hours due to limitations in Azure's Automation environment.

I've written a script that iterates over all of the containers and blobs. I need to check if each blob exists in the backup container. If not, copy it.

The way I've broken this up is into manual groups of 40 containers, but even those groups take longer (at least on first run) than 3 hours. Additionally, this solution isn't really scalable. We add customers all the time, I don't want to have to manually maintain the lists.

Since the containers are numbered, I've considered some sort of modulus division for breaking up the workload, I could do that as 10 different jobs, this is gross, but doable, but probably will break on a not too long timescale once we have enough activity to slow things down.

Was wondering if others have run into similar problems and if there are other ways I might chop this up.

Finally, I could just move all of this to a VM and run the workload from there, but I was biasing towards using the available platform functionality.

Daniel
  • 11
  • I had exactly the same reason for doing this, and the same problem with hitting the 3 hour limit with Azure Automation. In the end I moved the backup process to a small VM, as even if I got each segment under 3 hours, it would likely go back up later as more data is added. I've also considered looking at Azure functions to do this. – Sam Cogan Oct 21 '16 at 13:16
  • Cool, thanks. I'm assuming the same thing, at some point the segmentation will result in groups that are too large. I am wondering if perhaps running hourly will keep the changes small enough that it finishes quickly. That will just require testing. – Daniel Oct 21 '16 at 13:35
  • Yeah it would depend what kind of backup you do. I do a full one daily at the moment so that wouldn't help but if you could make it incremental it may. – Sam Cogan Oct 21 '16 at 13:54

1 Answers1

-1

If you want to continue to use Azure Automation you are going to need to either:

  • Split the tasks into units that are small enough to fit in under 3 hours (with significant overhead for growth) and run jobs in parallel or
  • Add checkpoints to your backup scripts to allow Azure Automation to resume jobs when it hits the 3 hours mark, however my experience with this has been that it can take some time for jobs to restart, which is not ideal for a backup that you want to occur in a set period

You could look at doing incremental backups or copying only what has changed, but obviously bear in mind that if you are doing these backups with the purpose of avoiding accidental deletion or corruption then you don't want to be just doing a pure copy of changes, else you will just replicate the corruption.

If you come to the conclusion that Azure Automation won't work then you need to either run the job as a scheduled task on a VM, or possible look at Azure Functions. Azure Functions would give you the freedom of running an unrestricted server process, but with the saving of only paying for the time the backup is active. It's not something I have tested yet, but this article seems to show some success.

The final option is to look at some third part tools that will do this for you such as Cherry Safe (again I haven't tested this, just know some people who do use it).

Hopefully MS will eventually come out with a tool to do this.

Sam Cogan
  • 38,736
  • 6
  • 78
  • 114