Cassandra: Can't one use snapshots to rapidly scale out a cluster?

Question

This details how to replicate data to a new cluster:

https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_snapshot_restore_new_cluster.html

Can't a similar scheme be used to rapidly scale out a cluster with existing data? Say take a snapshot of all the nodes, copy them all to new nodes, set the tokens in the yaml, set the peers to point to the old instances, and then join them up?

Won't they be treated like nodes that once were part of the cluster and were rejoined?

score 2 · Answer 1 · answered Feb 14 '18 at 01:08

2

That won't work, because snapshots are specific to the node which they are taken on. Once you add (or remove) a node, the token ranges on all nodes are recalculated, and you immediately invalidate any existing snapshots. Restoring the snapshots to another node would appear to work, but it would only serve the data which happened to match its token ranges.

Plus, it would try to serve data which matches its token ranges whether or not the snapshot you restored from had that data or not. Not a good scenario.

answered Feb 14 '18 at 01:08

Aaron

55,518
11
116
132

Yes but you can hardcode the token ranges in the yaml using initial_token, they do it in the linked article under different circumstances. – Constance Eustace Feb 14 '18 at 17:21
@ConstanceEustace Even if you are in full control of token range assignment, the only way restoring a snapshot to a new node makes sense, is if the token ranges on the new node matched another. And Cassandra won't allow that. – Aaron Feb 14 '18 at 17:25
So the vnode tokens must be distinct for each node in the cluster? Cassandra needs some sneakernet node standup capability. Perhaps we could let a node join, look at it's tokens, and then use a special sstableloader that only sends in locally relevant tokens from sstable snapshots from other nodes. That would probably STILL stream data to other nodes though. The only way to do expansion is to add nodes and wait for natural streaming to occur? – Constance Eustace Feb 14 '18 at 17:44
@ConstanceEustace Correct. Streaming data to a new node is pretty much the only way. If streaming at bootstrap-time is an issue, you can always use `nodetool rebuild` if you have another data center to stream from. – Aaron Feb 14 '18 at 18:43
After more reading: - vnode tokens are randomly generated - I believe that is true of "new" nodes as well - in theory I think this should be possible, if we wanted to double the cluster: 1) put snapshots onto three new nodes. 2) on each of those nodes, take the tokens of the node being copied and calculate the midpoint between the primary ranges that node was responsible for 3) set the new nodes to those calculated midpoint tokens. The RF should be maintained, since the adjacent primary ranges are the RF, and the new node is an RF of the old one already. .. I will try a test – Constance Eustace Feb 20 '18 at 17:27

Cassandra: Can't one use snapshots to rapidly scale out a cluster?

1 Answers1