1

We have a stream of data coming to Table A every 10 mins. No history preserved. The existing data has to be flushed to a new table B every time data is loaded in Table A. Can this be done dynamically or automated in Cassandra?

I can think of loading the Table A into a CSV file and then loading back to Table B every time Table A is flushed. But i would like to have something done at the database level itself. Any ideas or suggestions appreciated.

Thanks, Arun

Arun.K
  • 103
  • 2
  • 4
  • 21

1 Answers1

1

For smaller amounts of data you could put this into cron:

https://dba.stackexchange.com/questions/58901/what-is-a-good-way-to-copy-data-from-one-cassandra-columnfamily-to-another-on-th

If larger and running newer versions of cassandra (3.8+)

http://cassandra.apache.org/doc/latest/operating/cdc.html https://issues.apache.org/jira/browse/CASSANDRA-8844

and then replay the data to the table that you need (by some sort of outside process, script, app etc ...).

Basically there are already some tools around like: https://github.com/carloscm/cassandra-commitlog-extract

You could use the samples there to cover your use-case.

But for most use cases this is handled at the application level, writes are relatively cheap with cassandra.

Marko Švaljek
  • 2,071
  • 1
  • 14
  • 26
  • Apologize, i was away for few days. My requirement actually changed. The new requirement is Spark should pick up a file every 10 mins and update one cassandra table and insert on another cassandra table. How to do a CDC on Cassandra using Spark? – Arun.K Apr 19 '17 at 18:36
  • No problem, Spark is about versatile with data processing as it gets. The easiest would be to just do it as it's described here: http://stackoverflow.com/questions/32451614/reading-from-cassandra-using-spark-streaming – Marko Švaljek Apr 19 '17 at 19:41