2

I understand that you don't have to rebalance the vnodes but when do we really use it in production scenarios? does it function the same way as a physical single token node? If so, then why use single token nodes at all? Does vnodes help if I have large amount data and the cluster size (say 300 nodes)?

Community
  • 1
  • 1
user1870400
  • 6,028
  • 13
  • 54
  • 115

2 Answers2

2

The main benefit of using vnodes is more evenly distributed data being streamed when bootstrapping a new node. Why? Well, when adding a new node, it will request for the data in its token range. Optimally, the data it requests would be spread out evenly across all nodes reducing the workload for all of the nodes sending the data to the bootstrapping node (and also speeding up the bootstrap process).

Once you have a high number of physical nodes, like your example of 300, it would seem this benefit would be reduced (assuming no hot spotting or data partitioning issues). I'm not aware of an actual guidelines referencing the number of nodes to use or not use vnodes other than what is in the documentation. Yes, it is seen in production.

More information can be found here: http://docs.datastax.com/en/datastax_enterprise/4.8/datastax_enterprise/config/configVnodes.html

Chris Gerlt
  • 647
  • 4
  • 10
1

In addition to Chris' excellent answer, I'll make an addition. When you have a large cluster with vnodes, it is helpful to let Cassandra manage the token ranges. Without vnodes, you would end up having to size and re-specify the token range for each (existing and) new node yourself. With vnodes, Cassandra handles that for you.

Compare the difference in the steps listed in the documentation:

Adding a node without vnodes: http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsAddRplSingleTokenNodes.html

vs.

Adding with vnodes: http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_node_to_cluster_t.html

Aaron
  • 55,518
  • 11
  • 116
  • 132