0

I maintaining a cassandra cluster with 2 data centers. Now I am going to add new data center in that existing cluster. After rebuilding data, how can i verify the consistency of data in new data center?

Community
  • 1
  • 1

1 Answers1

1

Read with LOCAL_QUORUM from each DC and compare be most straight forward.

A repair builds a hash of partitions from the sstables in a compaction task and compares ranges of them which is more efficient than reading data individually. You could just pull that part out of code to build a tool to do same thing... or if you can just run a (full not incremental) repair. It logs about differences it finds.

Chris Lohfink
  • 16,150
  • 1
  • 29
  • 38
  • Both suggestions are interesting, I guess the first one depends on the size of the data set, the second one sounds like a fun project – raam86 Aug 03 '17 at 14:45
  • Running full repair will be IO intensive task. Any other suggestions? I have heard we could run spark job to do this. any idea on that? – Rishikesan Varudharajan Aug 04 '17 at 10:34
  • a spark job would read all the data as well. Difference is after reading all the data the repair job will only send a merkle tree (hashes) of data to be compared while spark will stream all the data over to be compared. But if you want to know specifics a spark job or a script to read at local_quorum will give you more details. – Chris Lohfink Aug 04 '17 at 16:16