We have a 10 node Cassandra cluster. We configured a repair in Opscenter. We find there is a backups folder created for every table in Opscenter keyspace. It keeps growing huge. Is there a solution to this, or do we manually delete the data in each backups folder?
1 Answers
First off, Backups are different from snapshots - you can take a look at the backup documentation for OpsCenter to learn more.
Incremental backups:
From the datastax docs -
When incremental backups are enabled (disabled by default), Cassandra hard-links each flushed SSTable to a backups directory under the keyspace data directory. This allows storing backups offsite without transferring entire snapshots. Also, incremental backups combine with snapshots to provide a dependable, up-to-date backup mechanism. ... As with snapshots, Cassandra does not automatically clear incremental backup files. DataStax recommends setting up a process to clear incremental backup hard-links each time a new snapshot is created.
You must have turned on incremental backups by setting incremental_backups to true in cassandra yaml.
If you are interested in a backup strategy, I recommend you use the OpsCenter Backup Service instead. That way, you're able to control granularly which keyspace you want to back up and push your files to S3.
Snapshots
Snapshots are hardlinks to old (no longer used) SSTables. Snapshots protect you from yourself. For example you accidentally truncate the wrong keyspace, you'll still have a snapshot for that table that you can bring back. There are some cases when you have too many snapshots, there's a couple of things you can do:
Don't run Sync repairs
This is related to repairs because synchronous repairs generate a Snapshot each time they run. In order to avoid this, you should run parallel repairs instead (-par flag or by setting the number of repairs in the opscenter config file note below)
Clear your snapshots
If you have too many snapshots and need to free up space (maybe once you have backed them up to S3 or glacier or something) go ahead and use nodetool clearsnapshots to delete them. This will free up space. You can also go in and remove them manually from your file system but nodetool clearsnapshots removes the risk of rm -rf ing the wrong thing.
Note: You may also be running repairs too fast if you don't have a ton of data (check my response to this other SO question for an explanation and the repair service config levers).
-
thanks for the explanation phact. But the issue i see was with the "Opscenter" and "System" keyspaces. The actual business data is less than 400MB on each node. Below is the snapshot of "ls command" for Opscenter keyspapce l4.0K /mnt/cassandra/lib/data/OpsCenter/rollups300/snapshots 2.7G /mnt/cassandra/lib/data/OpsCenter/rollups300/backups 2.8G /mnt/cassandra/lib/data/OpsCenter/rollups300 – kris433 Apr 09 '15 at 21:06
-
Sure, snapshots work the same on all keyspaces. You mentioned backup folder, did you mean snapshots directory? Opscenter will store time series data for all the nodes on your cluster. There are ways to store less data if it's getting too big. The system keyspace should always be small. – phact Apr 09 '15 at 21:08
-
4.5G /mnt/cassandra/lib/data/system/compactions_in_progress/backups 4.5G /mnt/cassandra/lib/data/system/compactions_in_progress the above is the some info from "System" keyspace Phact. You observe how the backups folder is realyy huge. The same happens even for "Opscenter" keyspace. I initiated the repair in Opscenter. I configured it for 9 days in Opscenter – kris433 Apr 09 '15 at 21:11
-
I'm aware that after taking a snapshot, every newly created "sstable" would be loaded into the incremental backups folder too. The "incremental backups" folder has to be cleared manually. We have been doing that with the tables related to business data. But i do not know how the "backups" folder has been created in the "System" and "Opscenter" keyspaces directories. – kris433 Apr 09 '15 at 21:20
-
-
Yeah so the issue is you turned on incremental backups. This is a backup issue not a repair issue. – phact Apr 09 '15 at 22:13
-
Thanks for the reply Phact. the backus folder under "/mnt/cassandra/lib/data/system/compations-in-progress" is almost 4.5G. Could i delete just the backups folder in there? – kris433 Apr 09 '15 at 22:27