-1

While reading documentation of Cassandra, I came across the term called clustering growth. After reading blogs, I came to know that Clustering is way of grouping of server (Distributed server) via a LAN, to solve the problem, behind it uses the Data Sharding and Partitioning Algorithms. But If we look then in case of Distributed System, where we do horizontal scaling of server. We scale the server horizontally and distribute the load, So we are saying that those server are somehow acheiving the Clustering properties. I basically want to know the difference between Clustering of Server and Replication of Server behind Load Balancer.

I want to know the difference between both of them, Since I knew that clustering is a way for database but I have seen clustered server also. Is Clustering a way of Horizontal scaling or what? Not precisely getting the answer.

1 Answers1

0

In Cassandra we don't tend to scale vertically unless there is a scenario where nodes are under-provisioned. The idea of 'clustering' and 'replication' is built into the very nature of how Cassandra is meant to work.

While you can run Cassandra on a single node, because it is designed as a distributed database, it is most common to have multiple nodes. A group of nodes communicating with each other to make up a distributed database are what we refer to as a cluster. The more nodes you add to a cluster, the more data ownership and workload is spread out, which is where the idea of scaling horizontally comes from.

So, to answer your question, 'clustering' is certainly a way of scaling horizontally when nodes are added to a common cluster to increase throughput. You can also think of a cluster as a logical way to organize data. A Cassandra cluster can have one or more DCs (DataCenters) that are responsible for one or more copies of the data (Replicas) depending on how you define things. I would recommend this quick read for a better understanding: https://cassandra.apache.org/_/cassandra-basics.html

Paul
  • 351
  • 1
  • 5
  • I got the difference between Clustering and Replicas in the case of a Database. and Clustering is a way of distributing the data among nodes and now those clusters are also replicated via DCs. But now my follow-up question is whether Domain/Business Servers are also clustered. Is there any situation where we Cluster the Business server? What is the actual term where we horizontal scale the business server behind an LB? (Replica/ Cluster)! Useful link also to know difference among two: https://stackoverflow.com/questions/19720010/difference-between-sharding-and-replication-on-mongodb – ZoroSenpai Nov 28 '22 at 00:55
  • In general, no, most don't find it necessary to replicate on the server, which would typically occur at the disk level. In Cassandra, utilizing multiple nodes with multiple replicas is typically sufficient and there would be no need to scale at the server level. – Paul Nov 29 '22 at 15:28