0

I have a reporting tool which reads data from Cassandra. Configurations are Consistency level is LOCAL_QUORUM, Compaction Strategy is size-tiered and RF=3.

When the pull request from reporting tool to Cassandra, as per the Cassandra design it trigger read repair for data consistency. This is good design actually. But read repair is expensive and reports are taking longer time.

My report users start generating reports only after 6 AM IST.Is there any way to schedule read repairs before the users starts using reports. for example, I do schedule and finish read repairs before 6 AM IST. So that, after 6 AM IST all the data would be consisted across cluster.

In this case, once report start reading data from Cassandra, it should not trigger read repair again as we just finished read repair as a scheduled job. I am fine with inconsistent data writes/updates after 6 AM IST happened. Which technique is good to schedule read repairs and do we really avoid read repairs if they are done recently. -Suyodha

2 Answers2

1

If you use traditional anti-entropy repair, you could then do reads at consistency level: ONE.

There are many ways to do anti-entropy repair, the most obvious is nodetool repair (likely with nodetool repair -par -inc or similar command line switches), or using some of the third party tools to repair small ranges, such as the Cassandra Range Repair tool maintained by Brian Gallew or Spotify's Cassandra Reaper.

Jeff Jirsa
  • 4,391
  • 11
  • 24
1

What makes you think read repairs are whats slowing it down? check (jmx) org.apache.cassandra.metrics:type=ReadRepair,name=RepairedBackground and org.apache.cassandra.metrics:type=ReadRepair,name=RepairedBlocking to verify if repairs are even occuring. Read repairs will only kick off if the data is inconsistent on the read which shouldn't be that common.

If its really a problem you can disable read repairs on a table by setting chance to 0.

ALTER TABLE yourtable WITH read_repair_chance = 0;
Chris Lohfink
  • 16,150
  • 1
  • 29
  • 38
  • Hi Chris and Jeff, Thanks for looking into it. I will come back to you guys on this as soon as possible. Thanks a lot. – Madhu Mohan Kommu Aug 01 '16 at 02:50
  • Disabling read repair with ALTER TABLE only protects against background read repair, which is nonblocking. Foreground read repair can only be disabled by using a lower consistency level. – Jeff Jirsa Aug 01 '16 at 19:59
  • Thats true, you need both CL.ONE and `read_repair_chance=0` to prevent both blocking and background read repairs. Only doing one would still let other occur. I think its still pretty likely that this isnt the real cause of the reports taking a long time – Chris Lohfink Aug 01 '16 at 21:03