We recently started using Yandex ClickHouse for our data backend, and I'm working on figuring out how best to backup our data. There seem to be two ways of doing this.
ALTER TABLE ... FREEZE PARTITION
The ALTER TABLE ... FREEZE PARTITION
command appears to simply create a local snapshot of a partition. I'd have to write a script which discovers all of the partitions in each table, and then issue the appropriate command.
In order to get the backups off the system, I reckon I'd have to create a backup of the shadow
directory on each server, and store that backup in another location (like S3 or something).
How would I keep the shadow
directory clean? Can the freezes be deleted?
Data Dump
The other way I've seen to backup data is to simply dump it to files, as this page suggests.
https://github.com/resure/scpnet/wiki/ClickHouse-backup
I'd have to write a script which discovers all of the tables in each database, and them dump all of the data into a file. This reminds me a lot of using mysqldump to backup databases in MySQL, but without the ability to just dump everything in one command.
Personally, I'm leaning towards this solution, as I it seems to be much easier to script and maintain, but I'm really curious to know what others are doing.