1

I have a Cassandra 3-node cluster and a keyspace created with a replication_factor of 3.

I make my backups for this keyspace with nodetool snapshot. As recommended by Cassandra documentation, to make a global backup I start it with a cron job on each node (3 nodes are NTP synchronized). I'm not using incremental snapshots, it's always a new global snapshot.

Unfortunately, I've some troubles with the restore process.

First of all, I've set a replication factor to 3 (and QUORUM level of consistency on READ and WRITE operations) to make sure my app keeps working even if 1 node is down.

  • My first scenario is not really a restore process: one node goes down because of, let's say the someone or something shutdown the VM that the node was running on. The 2 others nodes keep working and receiving write/read requests. 24 hours later, I manage to restart the VM of the first node, all services and files are still there, and I'm about to restart the node. Are there any actions that I should do before or after the restarting?

  • Second scenario is pretty much the same, but I was not able to recover the VM of the first node and I need to reinstall everything on it, including Cassandra. How should I use my backup to resync this node? Should I even use it or is Cassandra capable to resync everything without me having to restore anything? What should I do precisely in this case?

  • My last scenario is different. I've lost all my nodes and cannot recover anything. I've my global snapshot (3 snapshots, 1 for each node, taken at the same time). What is the process in this case?

I've read the Cassandra documentation for the restore process, and I've a preference for the simple copy-restore (in other words, I rather not use sstableloader). I've troubles to understand when I should use refresh and/or repair commands in those scenarios.

1 Answers1

1

I've troubles to understand when I should use refresh and/or repair commands in those scenarios

According to documentation you should perform refresh when you restore data from a snapshot, the 2nd and the 3rd scenarios.

I suppose repair is not required step for all three scenarios. But I would recommend perform it because it is easy and useful step to have consistent data on just restored nodes.

Furthermore repair on a regular basis is a recommended part of cassandra cluster maintenance.

Mikhail Baksheev
  • 1,394
  • 11
  • 13
  • Thank you for your answer. But in case of the first scenario, after restarting the node that was shutdown for few hours or days, should I do something else, beside `nodetool repair`? And for the second one, should I use my snapshot from the lost node or is Cassandra capable of resync everything by itself using the 2 other nodes? – The Wingman Dec 13 '16 at 14:38
  • @TheWingman, Repair is enough for the first scenario, even the node was down for a long time. And for the second one, Cassandra can bootstrap data from other nodes (http://cassandra.apache.org/doc/latest/operating/topo_changes.html#bootstrap) but it can take long time compared to restoring from snashot. – Mikhail Baksheev Dec 13 '16 at 16:19