Cassandra Old data deletion

Question

On cassandra, we only need 100 days of data for specific tables. However, we only recently set the TTL value and the data older than that still stays in the system as stale data. We were thinking of different approaches to delete the old data out of the system. One suggestion was to create a Spark job to identify the data older than a specific timeframe and delete them all. Another thought was to create a new table with just 100 days data and delete the old table. But I have various doubts on

how to rename the table where live data is being updated,
how will cassandra deal with such a table? While I have recreated a new table with less data and renamed it on one node(say node 1), will the other nodes in the cluster automatically delete the older data in their tables or sync the table on the node 1 and push all the older data onto it?

I am really new to cassandra and require expert advice on this. Please suggest if there are better ways to handle this.

score 1 · Answer 1 · answered Nov 03 '21 at 16:19

Cassandra does not have a way to rename a table, you will need to

create the new table with a different name
ensure this table has the TTL clause
load into it only the subset of records that you are interested on; this could be tricky as the query will depend on the schema of the table, is the column with the timestamp part of the clustering key?
update your application to point to the new table
drop the table

Cassandra Old data deletion

1 Answers1