1

I have two Datastax Clusters on google cloud(Two different accounts). Both clusters have different keyspaces(data).

I want to combine both the clusters and want to utilize both cluster nodes as well to handle the load.

I dont want to import/export the data. Since both clusters are small and not able to handle the load ( Want to club the clusters and utilize the nodes as well to handle the load).

Is there anyway we can do this, either cluster is in cloud or not?

Thanks,

user374374
  • 343
  • 3
  • 17

2 Answers2

3

It is doable but tricky.

Cassandra knows which nodes are part of a cluster based on the cluster name. If your cluster name is not the same for both cluster, first step would be to rename your clusters to have the same name.

The second step is to take one cluster as the parent cluster where you will join the other nodes to it. Let's call this the parent cluster and the other one the joining cluster. In this step, define the keyspace and column families that exist in the joining cluster to be the same as the parent cluster. At this stage, your parent cluster has the keyspace definition but no data from the joining cluster. On the other hand, in the joining cluster, you will have to define the keyspace that exists on parent cluster the same way.

Your nodes in both clusters have to have public interfaces in order to be able to communicate. I am not sure how this is done on Google Cloud, but I am sure you can give public interfaces to your instances in both accounts. Then you treat these two clusters as two different datacenters in Cassandra notion, and once all the machines can access Cassandra ports on each other, change cassandra.yaml on each cluster and add other cluster's nodes to it. If you are using property file snitch to manage your replication, you need to update that as well so that it recognizes all nodes and their location.

Finally, do a rolling restart and alter the keyspace replication factors to replicate the way you want.

Updates: Adding clarification for Daniel Compton's point, that when public interface is enabled, you need to properly setup encryption for replication between public interfaces as well as restricting access to those public interfaces to only the IPs of all of your cassandra nodes.

Renaming the cluster is possible and I have exercised this who process once before.

To rename cluster, change the cluster name in cassandra.yaml. Then change system.local table on each node to reflect that change and do a rolling restart. details of renaming the cluster can be found here:

cassandra - Saved cluster name Test Cluster != configured name

Community
  • 1
  • 1
Arya
  • 2,135
  • 15
  • 13
  • Keep in mind if you give public interfaces to the Cassandra nodes then they will be publicly accessible. You could do http://thelastpickle.com/blog/2015/09/30/hardening-cassandra-step-by-step-part-1-server-to-server.html, but it is probably better to allow the two projects to access each others resources. – Daniel Compton Nov 15 '16 at 20:38
  • This answer is HIGHLY suspect. "first step would be to rename your clusters to have the same name" - how do you propose the user do that? I strongly doubt that advice is actually possible. – Jeff Jirsa Nov 15 '16 at 20:42
  • Thank you both. I updated the answer to reflect your comments. – Arya Nov 15 '16 at 21:22
1

You can't join two clusters together with different names and different schemas. Bad things will happen. What you'd need to do is backup the data from one, create the keyspace in the other, use 'sstableloader' to stream the data in, and then bootstrap the new nodes in after the fact.

Because you're using a cloud, the easiest option is to temporarily add a few nodes to the new cluster, stream in the data, then remove the old nodes/cluster. Trying to get clever and merging the clusters will be more pain than it's worth.

If you were using bare metal and didn't want to spend capital to buy more hardware, you MAY be able to get clever, but in the cloud there's no real reason to do so.

Jeff Jirsa
  • 4,391
  • 11
  • 24