What is fastest way to regularly back up large data-files on a weekly basis

Question

We have automated script to back up 200 GB of data files to a local disk. the script shuts down the database , tars and compresses the entire directory locally on the disk, and then starts the database

tar -czvf data.tgz /some/folder

This process takes two hours which is too long a down-time . We want to reduce this down-time.

Consider the following: - The main goal is to have an identical copy of the files in the shortest possible time while the database is down. Later on, we can compress , transfer, or do any other operation on the files.

I was thinking to use rsync to sync the files every week with the target backup, and rsync will update only the changes which will take less time.

Will that work, or there is better approach ?

What kind of database? For some databases it is often good to setup replication between the master and a replica, and then do the backups against the replica which you can backup without interrupting anything. — Zoredache, May 11 '18 at 22:25
As for database, from experience, do not forget that the files content (if you do a binary/filesystem backup instead of an SQL one) depend on both the OS, number of bits, the RDBMS system and version. It might be useful to remember to backup also the database binaries to be able to reload these specific backups in the future. — Patrick Mevzek, May 11 '18 at 22:43
What about doing an export of the database contents from a running database? It might take too long as well, but if you keep a read replica and backup from that the time it takes is relatively unimportant. You could also do something like a weekly full backup then syncing the transaction logs so you can rebuild if something goes wrong. — Tim, May 12 '18 at 23:11

score 2 · Answer 1 · answered May 11 '18 at 21:53

2

Filesystem snapshots are the right way to go about doing something like this.

answered May 11 '18 at 21:53

84104

12,905
6
45
76

score 2 · Answer 2 · edited Jan 02 '20 at 09:19

2

You could try the rsync tool for backup:

rsync -av host::src /dest

For complete documentation check mentioned below link : https://linux.die.net/man/1/rsync

edited Jan 02 '20 at 09:19

bjoster

4,805
5
25
33

answered May 12 '18 at 10:36

AliRazaRizwe

21
1

score 1 · Answer 3 · answered May 11 '18 at 22:01

1

With regards to the answer of 84014, make sure you flush tables and (read) lock them before making a snapshot. This ensures a more consistent snapshot with not broken transactions. Also regularly backup your transactions logs to an offsite location so you can have point-in-time recovery when you require this. Best do this on a replicated slave when possible.

Rsync is imho for databases not the way to go.

answered May 11 '18 at 22:01

Rick

11
1

1

This all depends on what kind of database you are running... – Patrick Mevzek May 11 '18 at 22:43

score 1 · Answer 4 · answered May 11 '18 at 23:38

1

Dirvish is what you're looking for. Any files that are identical get hardlinked so you have a full directory tree to copy off, plus it uses rsync so you save bandwidth on partially changed files.

answered May 11 '18 at 23:38

Dessa Simpson

539
7
27

score 0 · Answer 5 · answered May 12 '18 at 14:16

If the DBMS supports replication, consider setting up a replication instance on separate storage, and possibly at a remote site. You might be able to turn the other one into the primary quickly.

But that is not backups, backups are offline. Determine how to do backups without taking the database down. Either the DBMS writes out the backup, or you tell it to suspend writes or otherwise get to a safe point and grab a copy of the files yourself.

The fast way to get a copy is a snapshot of the data volume. Fancy storage arrays can snapshot a LUN, then present it to a different backup host. Or, take a LVM level snapshot to do it at the host level. Either way, the backup is not complete until it is copied to different off site media.

score 0 · Answer 6 · edited Jan 01 '20 at 19:58

The ideal backup strategy is strongly dependant on the specific database you are running. Anyway, here are some generic advices to reduce downtime:

if your filesystem or volume manager supports snapshots, you can use them to greatly reduce the expected downtime. The work flow will be something similar to that:
1. stop your database;
2. create a snapshot;
3. restart database;
4. run the backup process against your snapshot, rather than live data.
if you can afford to lose the very latest transaction in your backup, you can modify the above sequence to avoid stopping/starting the database, effectively giving you a no-downtime backup process;
in case you can't rely on snapshots, you must decrease the copy time as much as possible. I strongly suggest you try tar --lzop -cvf, which will use the very fast lzo compressor. The database must be stopped for the entire backup duration;
if this is not sufficient, you should try to only copy the changed block out your data files. Try bdsync or blocksync to see if subsequent backups are faster than the first one. Note that both utilities works on single files, so you must script around them to copy multiple files. The database must be stopped for the entire backup duration;
rsync is not generally recommended for copying very large files; however, you can try with something like rsync -a --inplace or, at the other extreme, rsync -a -W. You clearly have to run some timed benchmark to discover what rsync invocation is better suited for your specific needs. Again, this must be done with the database stopped for the entire backup duration;
if these approaches do not work, or are inapplicable to your case, you will have to setup a database-specific backup process (ie: relying of replication or log shipping to a secondary host).

What is fastest way to regularly back up large data-files on a weekly basis

6 Answers6