0

My question is a follow up of this topic : Cassandra load balancing with TokenAwarePolicy and shuffleReplicas

I'm encountering some issues regarding the TokenAwarePolicy that I don't understand.

Cluster configuration :

  • 3 nodes
  • Replication factor = 3
  • Load balancing policy : new TokenAwarePolicy(new RoundRobinPolicy(), false)
  • Consitancy level (Reads/Writes) : ONE

Shuffling replicas is set to false on purpose. But the thing is that I'm encountering consistency problems when reading datas in my application. For instance :

  • Insert 10 entities
  • Dome some other operations with the DB...(insert other entites)
  • Select the previously created entities: All fields are given in the 'SELECT' clause, not 'SELECT *'. Primary key is present in the 'WHERE' clause.

Result : Sometimes getting the right number of entites(10) but sometings getting less (from 0 to 9).

Notice:

  • I'm using BoundStatements only.
  • I'm not using datastax's asynchronous methods.
  • I have checked that the routing key was not null on the 'failed' queries, it wasn't.

I have to admit that I'm targeting the DB with an heavy load (30 threads running X times the above sample together), but still I don't understand why the driver is not querying the right node, giving me stale datas.

Thanks for your answer.

Erick Ramirez
  • 13,964
  • 1
  • 18
  • 23
  • 2
    Your reads might be racing the writes to disk (and losing, i.e. reading before the writes complete). You might try, as an experiment, using a consistency level of `LOCAL_QUORUM` for your writes, and keep using a consistency level of `ONE` for your reads (assuming you _need_ those fast reads). – Castaglia Jun 10 '16 at 16:15
  • There's a shuffle false flag in token aware to always get the primary replica. Now, make sure you think about the load implications if always writing and reading (I.e. beating up) your primary replicas – phact Jun 10 '16 at 17:45
  • @Castaglia Yes of course if I use LOCAL_QUORUM for writes it works but I have a loss of performances. My question is more about the shuffle false flag phact is talking about : As you can see in my question I use this flag, but I have the feeling that it doesn't work as expected : I don't notice any significant hotspot and I read stale datas – Barnabé Faliu Jun 13 '16 at 09:24
  • With an RF of 3, there are 3 possible nodes which contain the data, according to the `TokenAwarePolicy`; your client only waits for the _first_ one of those nodes to acknowledge the write. A subsequent read _could_ be directed to the _last_ (_i.e._ the _slowest_) of those nodes (hence the stale data), particularly if the primary replica-bearing node is always heavily loaded, due to the lack of shuffling. – Castaglia Jun 13 '16 at 15:28

0 Answers0