2

I have configured a Cassandra cluster spanning two data centers (AWS, us-east and us-west). The writes happen only to the us-east ring, and I can see the data synchronizing to the other ring. However, the lag is high.

On DC1
cqlsh:ks> select count(*) from cf1 limit 1000000;

 count
--------
 225568

On DC2
cqlsh:ks> select count(*) from cf1 limit 1000000;

 count
--------
 139964

--

  1. Why is this so, and what does this depend on?
  2. is there a way to see the lag using any tools? is this available to view in OpsCenter?
vrtx54234
  • 2,196
  • 3
  • 30
  • 53
  • What consistency level do you use for writes? Using `EACH_QUORUM` (details: http://www.datastax.com/documentation/cassandra/2.1/cassandra/dml/dml_config_consistency_c.html) might help with the lag (since writes won’t complete until both DCs acknowledge them), but it’ll increase the latency for writes. In general, you probably should look at network and disk throughput. – arre Oct 10 '14 at 21:48
  • We use LOCAL_QUORUM currently and would like to keep it for latency reasons. Will look at network and disk throughput. – vrtx54234 Oct 10 '14 at 21:49

1 Answers1

2

As your two DCs are in different AWS regions you may well see some lag between the two. This does depend on the amount of data being synced across the DCs. If you have large column families and / or a high level of writes then this will only mean more data to sync. Using LOCAL_QUORUM is the right choice for keeping writes in the local DC. You could use a lower consistency level if you wanted, generally speaking if data consistency is important the rule of thumb is always to write at a higher consistency level than your reads.

Aside from the usual OS-level tools, Cassandra does have the nodetool utility. For monitoring you can use the following nodetool commands:

nodetool netstats - (shows you if the node is streaming data) http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsNetstats.html

nodetool cfstats - (shows column family stats useful for latency etc) http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsCFstats.html

nodetool proxyhistograms - (shows stats from the co-ordinator nodes) http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsProxyHistograms.html

There are also a number of other very useful nodetool commands, that you can use:

http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsNodetool_r.html

I'm assuming you are using Cassandra 2.0 but for other versions a lot of the commands are similar for nodetool

As a side note, you can also use OpsCenter which gives a graphical view of the cluster, for more info see: http://www.datastax.com/documentation/opscenter/5.0/opsc/about_c.html

markc
  • 2,129
  • 16
  • 27