I have a 6 node Cassandra cluster with replication factor of 3. No nodes were added/removed/down during the recent days. My application has three Kafka topics and four Storm topologies:
- Topology-1 gets events from Topic-1 and stores them to Cassandra
- Topology-2 gets the same events from Topic-1 and in certain cases sends an event to Topic-2
- Topology-3 listens to Topic-2 and for every consumed event gets a list of some other events previously stored to Cassandra, packs it into an event and sends to Topic-3
- Topology-4 gets lists of events from Topic-3, performs some calculations and stores the result in another database (ElasticSearch).
All of the topologies are implemented in Java using DataStax driver version 2.1.7.1 and its object mapping API, the Cassandra version is 2.2.3.
Coming back from the weekend, I noticed the following:
- A lot of errors in the log files coming from the topologies 3 and 4 complaining about missing events in Cassandra.
- No complains from the topology 1 about not being able to write to Cassandra.
That seemed pretty much like a consistency level issue. All of my read and write queries were running at consistency level one. I switched it to quorum. (What is also noteworthy, I had to restart the Storm stuff in order for the changes to become effective.) After that all went back to normal, topology 1 writes events, topology 3 reads them, all consistent, all fine.
BUT.
I'm still missing a lot of the events for several hours before applying that consistency level fix, I don't see them in Cassandra even when querying manually with cqlsh having consistency level all!
Could it happen that inserts were missed when performed at consistency level one or where else could I look for the root cause?
Thanks in advance!