0

I have a multi datacenter(DC1, DC2) environment having 3 nodes in each datacenter with RF=3 per datacenter.

  1. Wanted to know if triggers can be used in production in a multi-datacenter environment. If so, how can this be achieved?

  2. Case A: If I start inserting the data to DC1, it would have 3 replicas with in DC1 and is responsible of replicating the data to other data center DC2. Every time an insert into DC2 takes place, I would like to have an trigger event to occur and notify about the latest inserted value in the application. Is it possible?

  3. Case B: If not point 2, is it good to insert the data simultaneously on to two datacenters DC1, DC2 (pointing to a single table) and avoid triggers concept? Will it have any impact with the network traffic? Based on the latest timestamp, the table would have the last insert to the table which serves the purpose when queried from either of the regions.

Consistency level as LOCAL_QUORUM for Read
Consistency level as ONE for write
dse 4.8.2

With these Consistency levels, good consistency can be achieved lowering the latency for write operation across the datacenters.

Usecase:

We have an application (2 domains) for two different regions(DC1 & DC2). Users of DC1 region uses domain 1 to access the application and users of DC2 region uses domain 2 for the same. The data is ingested to DC1 for the same region and when this replicates in its DC, the coordinator of DC1 would replicate the data in other DC (DC2). The moment Dc2 receives the data from DC1, we want to let the application know about the latest information (Polling_ available using some trigger event mechanism. Just wanted to know if this can be implemented with cassandra triggers.

Can someone give the feedback on Case A and Case B? and which would be efficient in production. Thanks

Community
  • 1
  • 1
Arun
  • 1,692
  • 15
  • 24

2 Answers2

1

In either case stated above I am not sure why you want to use a trigger to notify your application that a value was inserted. In the scenario as I understand it your application already knows the newest value. Once the write has been successful you can notify your application with the newest value.

In both cases A and B you are working against some of the basic principals of how Cassandra functions. At an application level you should now need to worry about ensuring replication or eventual consistency of your data across multiple nodes and data centers. That is a large part of what Cassandra brings to the table.

In both Case A and B you are going to get multiple inserts of the same data for each write in each node it is replicated to in both data centers. As you write to DC1 it will also be written to DC2. If you then write to DC2 it will be written back to DC1. This will end with a large number of rows containing the same data and will increase disk requirements and compaction frequency. This will also increase network traffic as the two DC's talk back and forth to gain eventual consistency.

From what I can see here I also have to ask why you are doing an RF=3 on a 3 node cluster. This means that each node in each data center will have all the data essentially making each server a complete replica of the others. This seems like it may be overkill (depending on the data of course) as you are not going to get a lot of the scalability benefits that Cassandra offers.

Cassandra will handle the syncing of data between the data centers and across nodes so your application does not need to worry about this.

One other quick note - Currently your writes are using a CL=ONE. This means that you may end up with cross-DC latency on a write request. If you change this to LOCAL_ONE then you limit your CL query until one of the nodes in the local DC has written the value instead of possibly a node in the other DC. Cassandra will still handle the replication and syncing of the data.

bechbd
  • 6,206
  • 3
  • 28
  • 47
  • Thanks for your feedback. Even we had the same opinion on Case B about inserting data in multiple regions. I agree with you about the data duplication, network latency etc. But I want Case A to be implemented based on our usecase. Please find the usecase updated in the question section. – Arun Feb 12 '16 at 20:23
  • While this might work, triggers are called before the actual data mutation takes place so this may or may not work in your specific use case depending on your requirements. – bechbd Feb 17 '16 at 18:22
  • Do we have any command to know the network traffic when the two DCs talk each other? From Opscenter, I see OS:Net Sent and OS:Net Received metrics. Is that how we know the traffic? Thank you – Arun Feb 17 '16 at 21:24
  • Depending on exactly what you are trying to see you could use `nodetool netstats` https://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsNetstats.html – bechbd Feb 18 '16 at 03:08
  • Can you please list out the downsides of using RF-3 for 6node cluster with 3 nodes per Datacenter and replication of 3 per datacenter? We have Write CL as LOCAL_QUORUM and Read CL as LOCAL_ONE. – Arun Feb 25 '16 at 21:03
  • How will the disk size gets increased when we insert the data into 2 applications(2 regions DC1, DC2) as we will be inserting the data to the same table? Even if we insert into 1 region, in Multi DC environment, it would replicate the data same across the DC, I believe – Arun Feb 25 '16 at 21:47
  • If you have a 3 node cluster (per DC) and an RF=3 then each node will have a copy of all the data. This means that your disk space used per node would be the entire size of your data + the size needed for running a compaction. If you have 2 DC's replicating data across each other then if you insert data into 1 DC1 it will replicate into DC2 – bechbd Feb 26 '16 at 16:53
0

Generally, multiple data center concept is used for workload separation(say different DCs for real-time query,analytic and search). Cassandra by itself takes care of replicating the data across multiple DCs. So, coming to your question Case B doesn't seems a right option because:

  1. Cassandra automatically replicates data across multiple DCs link
  2. Case A is feasible.alerts/notifications using triggers

Hope, it will be helpful.

Mayank Raghav
  • 640
  • 1
  • 7
  • 17