0

I manage a number of databases on unix servers, and do daily backups of these databases using mysqldump. Since (some of) these databases are very large (20+Gb), I usually zip the backup .sql files using bzip2, to get compressed bz2 files.

As part of the backup process, I check that the size of the new backup file is greater than or equal to the size of the previous backup file - we are adding data to these databases on a daily basis, but very rarely remove data from these databases.

The check on the backup file size is a check on the quality of the backup - given that our databases primarily only grow in size, if the new backup is smaller than the old backup, it means either a) something has been removed from the database (in which case, I should check out what...) or b) something went wrong with the backup (in which case, I should check out why...).

However, if I compare the sizes of the bz2 files - for example, using comparison (using test) of stat %s, even though the database has increased in size, the bz2 files may have shrunk - presumably because of more efficient compression.

So - how can I compare the size of the backup files?

  • One option is to decompress the previous backup file from .bz2 to .sql, and compare the sizes of these .sql files. However, given that these are quite large files (20+Gb), the compression/decompression can take a while...
  • Another option is to keep the previous backup file as .sql, and again do the comparison of the .sql files. This is my preferred option, but requires some care to make sure we don't end up with lots of .sql files lying around - because that would eat up our hard drive pretty quickly.

Alternatively, someone in the SO community might have a better or brighter idea...?

amaidment
  • 6,942
  • 5
  • 52
  • 88
  • Before I migrate this anywhere, it would be useful if you could explain why you need to check the difference in sizes between the current and previous backup files, and why this couldn't be recorded somewhere such as in a local file? Thanks. – Kev Oct 23 '12 at 22:42
  • @Kev - edited Q to add motivation for the size comparison. As to recording backup file sizes in a file (presumably before compressing) and then reading from the file to compare with stat of the new backup file - that is a possible answer to my question. – amaidment Oct 24 '12 at 07:49
  • Since size of input file is not stored in bzip2 files (unlike gzip), your best option is to store it separately. – sendmoreinfo Dec 31 '12 at 13:53

1 Answers1

0

It's possible to split the input files into parts (100MB chunks for example) and compare them separately. As size might actually also stay the same even with different input, you should generally not use it for looking for differences - instead use something like cmp to see if the files differ.

It's also possible to just cat the bz2 files of the individual parts together and get a perfectly valid multi-stream bz2 file, which can be uncompressed again in whole without any problems. You might want to look into pbzip, which is a parallel implementation of bzip and uses exactly this mechanic for parallel bzip into a multi-stream bz2 file to speed up the process on smp/multi core systems.

As to why I would suggest splitting the files into parts: Depending on your mysql setup, it might be possible that some of your parts never change, and data might actually mostly get appended at the end - if you can make sure of this, you would only have to compare small parts of the whole dump, which would speed up the process.

Still, be aware, that the whole data could change without anything added or removed, as mysql might resort data in memory (OPTIMIZE command for example could result in this)

Another way of splitting the data is possible if you use InnoDB - in that case you can just tell mysql (using my.cnf) to use one file per table, so you could a) bzip those files individually and only compare the tables which might actually have changed (in case you have static data in some of the tables) and/or b) save the last modified date of a table file, and compare that beforehand (again, this is only really useful in case you have tables with only static data)

griffin
  • 1,261
  • 8
  • 24