I have a reporting tool which reads data from Cassandra. Configurations are Consistency level is LOCAL_QUORUM, Compaction Strategy is size-tiered and RF=3.
When the pull request from reporting tool to Cassandra, as per the Cassandra design it trigger read repair for data consistency. This is good design actually. But read repair is expensive and reports are taking longer time.
My report users start generating reports only after 6 AM IST.Is there any way to schedule read repairs before the users starts using reports. for example, I do schedule and finish read repairs before 6 AM IST. So that, after 6 AM IST all the data would be consisted across cluster.
In this case, once report start reading data from Cassandra, it should not trigger read repair again as we just finished read repair as a scheduled job. I am fine with inconsistent data writes/updates after 6 AM IST happened. Which technique is good to schedule read repairs and do we really avoid read repairs if they are done recently. -Suyodha