4

I would like to know how I can use CDC in cassandra. I found that this is already is implemented started from 3.8 version( https://issues.apache.org/jira/browse/CASSANDRA-8844). Are there any examples of usage?

M--
  • 25,431
  • 8
  • 61
  • 93

3 Answers3

1

1. Enable CDC on cassandra.yaml

cdc_enabled (default: false)
Enable or disable CDC operations node-wide.

2. Enabling CDC on a table

CREATE TABLE foo (a int, b text, PRIMARY KEY(a)) WITH cdc=true;
// or
ALTER TABLE foo WITH cdc=true;

3. After memtable flush to disk you can access the row CDC data in $CASSANDRA_HOME/data/cdc_raw

In this folder cassandra store CommitLogSegments.You can check this link Read CommitLogSegments

Read More : https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/doc/source/operating/cdc.rst

Ashraful Islam
  • 12,470
  • 3
  • 32
  • 53
1

You can write your own implementation of CommitLogReader, or use this sample implementation.

However, please note that CDC logs are not too much reliable (because of duplicate events and time taken to flush data to CDC), and are subject to format change in future releases.

Community
  • 1
  • 1
Arry
  • 1,630
  • 9
  • 27
  • 46
0

I work at ScyllaDB which is Cassandra compatible and has CDC support as well - that is simpler to use.

You can specify if you with to get only the delta, pre=image, post-image. Data is stored in a system generated table and can be accessed and read via CQL.

As such:

  • there is no need to write and deploy code on the cassandra nodes to consume commitlogs (nor is there a need to flush to get them)
  • deduplication is inherent to the solution.

you can read more in https://docs.scylladb.com/using-scylla/cdc/

Shlomi Livne
  • 467
  • 2
  • 3